Saturday, April 18, 2026

Microservices Distributed Transactions & Failure Handling

 

🚀 Introduction

In a monolithic system, handling transactions is simple:

  • One database
  • One transaction
  • Either commit or rollback

But in microservices, things break apart:

  • Each service has its own database
  • No global transaction manager
  • Network failures are common

👉 This creates the Distributed Transaction Problem


❗ Why Distributed Transactions Are Hard

Imagine an e-commerce flow:

Order Service → Inventory Service → Payment Service

What if:

  • Order is created ✅
  • Inventory reserved ✅
  • Payment fails ❌

Now your system is inconsistent.

Traditional ACID transactions don’t work well here because:

  • Services are independent
  • Databases are separate
  • Network failures are unavoidable

🧠 Approach 1: Two-Phase Commit (2PC)

🔹 How it works

Phase 1: Prepare
→ All services say “ready?”

Phase 2: Commit / Rollback
→ If all agree → commit
→ If any fail → rollback

⚠️ Problems with 2PC

  • Slow (blocking)
  • Single point of failure (coordinator)
  • Poor scalability
  • Locks resources for long time

👉 That’s why modern systems avoid 2PC


✅ Approach 2: Saga Pattern (Recommended)

💡 Core Idea

Break one big transaction into multiple small local transactions

T1 → T2 → T3 → T4

If something fails:

C3 → C2 → C1 (compensating transactions)

👉 Instead of rollback, we undo using business logic


🔄 Saga Flow (Text Diagram)

✅ Success Flow

[Order Service] → create order (PENDING)

[Inventory Service] → reserve stock

[Payment Service] → charge customer

[Order Service] → mark CONFIRMED

❌ Failure Flow

[Order Service] → created

[Inventory Service] → reserved

[Payment Service] → FAILED ❌

[Inventory Service] → release stock

[Order Service] → cancel order

👉 Each step has a compensation step


🧭 Types of Saga

1️⃣ Choreography (Event-driven)

No central controller. Services react to events.

OrderCreated → Inventory Service
StockReserved → Payment Service
PaymentDone → Order Service

✔ Pros:

  • Loose coupling
  • Simple for small systems

❌ Cons:

  • Hard to debug
  • Event chaos in large systems

2️⃣ Orchestration (Central Controller)

A central service controls everything.

Orchestrator:
→ Call Order Service
→ Call Inventory Service
→ Call Payment Service

✔ Pros:

  • Clear flow
  • Easier debugging

❌ Cons:

  • Central dependency

⚠️ Failure Handling Strategies

Handling failures is the core challenge in distributed systems.

🔁 1. Retry Mechanism

  • Retry temporary failures
  • Use exponential backoff
Retry → Retry → Fail → Compensation

🔄 2. Compensating Transactions

  • Undo previous steps

Example:

  • Refund payment
  • Release inventory

👉 This is the heart of Saga


🧱 3. Idempotency

Ensure repeated operations don’t break data:

Charge $100 → Retry → Should NOT charge again

📬 4. Message Queues

Use Kafka / RabbitMQ:

Service Down ❌
Message Stored ✅
Processed Later ✅

📦 5. Transactional Outbox Pattern

Ensure DB update + event publish are atomic:

DB Commit + Event Save → Then Publish

Prevents:

  • Lost messages
  • Inconsistent states

🔍 6. Observability

  • Logs
  • Tracing
  • Metrics

👉 Helps debug failures across services


⚖️ Trade-offs

Feature2PCSaga
Consistency            Strong            Eventual
Performance            Slow            Fast
Scalability            Poor            High
Complexity            Low            High
Failure Handling            Automatic rollback            Manual compensation

🧠 Key Insight

👉 In microservices, you don’t aim for perfect consistency

You aim for:

Eventual Consistency + Resilience


🏁 Conclusion

Distributed transactions are unavoidable in microservices, but traditional solutions like 2PC don’t scale.

The Saga pattern is the industry standard because:

  • It embraces failure
  • It works asynchronously
  • It scales well

But it comes with a cost:
👉 You must design failure handling explicitly


✍️ Final Thought

“In distributed systems, failure is not an exception — it’s the default.”

No comments:

Post a Comment