Handling Double Payments at Airbnb | Biweekly Engineering - Episode 32

How Airbnb built solution to avoid double payments in a distributed payments system

Hi there dear readers! Long time no see! 🌊 

It has been very busy for me with my professional and personal life. Hoping to get back to regular episodes as things are getting less busy nowadays.

I hope all of you have been doing great!

Today, let’s learn about idempotency in distributed payment systems. I have a post to feature from Airbnb where the team showcased how they made sure double payments are avoided in their systems through a generic idempotency framework.

Let’s have a read.

A short walk from my home!

Avoiding Double Payments in Distributed Systems - Orpheus at Airbnb

In this very well-written article, Airbnb shares their insights on idempotency - why we need it in a distributed paymentssystem and how they solved the problem. Let us briefly discuss.

What is idempotency?

Idempotency is a property that ensures multiple identical requests have the same effect as a single request. This means if a client sends the same request multiple times due to network failures or retries, the state of the system remains unchanged after the first request is processed.

In the context of a distributed payments system, this prevents issues such as double payments by ensuring that repeated operations do not result in duplicate transactions.

Why does idempotency matter?

For systems that process payments, it is absolutely critical that double payments are avoided! Otherwise, as you can imagine, your money will fly away. đź’¸ đź’¸

In distributed payment systems, we have a bunch of different systems acting together to process and execute a payment. Multiple systems mean multiple network calls and multiple points of failure where one misbehaving system can have drastic consequences.

In a distributed systems environment where network failures, timeouts, or retries can cause duplicate requests, idempotency guarantees that these duplicates do not result in inconsistent or erroneous states, such as double payments.

âťť

For an API request to be idempotent, clients can make the same call repeatedly and the result will be the same.

From the article

So in one line: we absolutely-most-certainly-for-sure need idempotency to build a robust payments system.

Key elements of the solution developed by Airbnb

Airbnb developed a solution to prevent double payments in their distributed payments system by creating an idempotency framework named Orpheus.

This framework ensures that each payment request is processed only once, even if multiple identical requests are sent due to retries or network issues.

We can see a few key elements of the solution:

  • Idempotency key: A unique idempotency key is generated for each transaction. This key uniquely identifies a single transaction.

  • Pre-RPC stage: Before a request is processed, the system checks if a transaction with the same idempotency key has already been completed by querying the master database. In this stage, a new request is also stored in the database if not there already.

  • RPC stage: This is where an RPC is sent to payment service providers to execute the payment.

  • Post-RPC stage: A response from the RPC stage is recorded in the database. If a request fails, failure is also recorded accordingly so that retries can be handled.

  • Retryable vs non-retryable requests: Every payment request is categorized as retryable or non-retryable. For example, 5xx HTTP status codes mean the requests are retryable whereas 4xx mean something is wrong with the request itself so it should not be retried.

  • Avoiding replica databases: A key aspect discussed in the article is avoiding of replica databases. When Orepheus reads or writes idempotency information, it is always done from the master database. This is a classic example of replication lag jeopardizing payment systems! The article outlines how the lag can create multiple payment requests.

We also learn two ground rules that the team followed to build Orpheus:

  • No service-to-service call during pre and post RPC stages

  • No database interaction in RPC stage

Why? The post discusses why the decided to avoid RPCs in pre and post RPC stages and only make RPC calls in the middle:

âťť

Simply put, network calls are inherently unreliable.

From the article

Overall, this article from Airbnb is a wonderful discussion on how idempotency is a critical part of any payment systems and how it can be solved. I am sure Airbnb systems evolved over the years (the post is 5 years old!) but it’s great to see how things were achieved in the past.

And that’s a wrap for today! I wanted to keep it short for this episode because of the length of the article, and of course, to make sure what I share is high-quality. See you in the next episode!

Happy read! đź‘‹ 

Reply

or to participate.