Biweekly Engineering
Posts
Search-As-A-Service at Twitter - Biweekly Engineering - Episode 1

Search-As-A-Service at Twitter - Biweekly Engineering - Episode 1

Curated blog posts from Twitter, Flipkart, and Airbnb

Biweekly Engineering
November 15, 2022

An alley in Barcelona

Welcome!

Welcome to the biweekly engineering blog reading list! Every two weeks, you will receive an email containing 3 to 5 cautiously curated articles on software engineering, picked up from various company engineering blogs. This issue marks the beginning of this reading list. Hope we can serve quality content through this initiative and all of us can benefit along the way.

Without further ado, let's get started!

How Twitter upgraded and improved its search-as-a-service platform

Stability and scalability for search

In this blog, we discuss how we power real-time search for Tweets, Users, Direct Messages and more using Elasticsearch.

https://blog.twitter.com/engineering/en_us/topics/infrastructure/2022/stability-and-scalability-for-search

#twitter #search #elasticsearch #microservices #distributedsystems

At Twitter, the Search Infrastructure team is responsible for providing a seamless search experience to product teams. The system uses Elasticsearch, a very popular search engine that can support high scale and efficiency.

Before the improvement initiatives by the team, consumers of the search service would directly interact with the Elasticsearch layer. But as you can imagine, this is not very scalable given Twitter's scale, which is why the team needed to make major changes to the system.

Precisely, the team designed and implemented three components—

A proxy layer This is like a frontend of the system for the consumers. Being a simple HTTP service, this layer enables authentication, metrics collection, throttling, etc.
Ingestion serviceLarge traffic spike would bring down the Elasticsearch cluster, which is why the team implemented an ingestion service. The service would gracefully handle large traffic with methods like batching requests, retrying and backing off, and auto-throttling of ingestion.
Backfill serviceIn case of ingestion failure, we need a backfill mechanism. This is where a backfill service improves the system where huge backfills can be handled resiliently using multiple workers. The backfill service also has multiple components that work together.

With this massive effort, the search-as-a-service at Twitter improved in terms of stability and scale.

Turbo Relayer at Flipkart - a message-relaying system

How do we relay messages at a high throughout between MicroServices? — Part 1

Motivation

https://blog.flipkart.tech/how-do-we-relay-messages-at-a-high-throughout-between-microservices-part-1-9ff304224b37

#flipkart #microservices #distributedsystems

Think of an e-commerce system where orders are handled by an order service. Upon receiving a request, the order service needs to persist the order in its own database and send a message to a message queue so that other services (for example, payment service, notification service, etc.) can consume the information. This sounds pretty obvious, right?

In fact, in system design interviews, we frequently discuss similar scenarios where a service receives a request, persists some data, and relays a message to a message broker.

Interestingly, the details are not as obvious as the idea sounds. When we have to persist data in two different systems (a database, and a message broker), we cannot make them atomic. What if the persistence in the database succeeds, but sending the message to the broker fails? Or what if the broker receives the message successfully, but the database persistence failed and it had a rollback?

This is where a microservices communication pattern, popularly known as Transactional Outbox Pattern, comes into the picture. In short, the pattern says that—

Create two separate tables - a primary table to store the original data (for example, orders), and an outbox table to store messages that will go to the broker for other services to consume.
Persist in both tables within a single transaction. This means that failure in any part of the transaction will ensure that no partial operation is done.
Create a process that will periodically read the outbox table and relay messages to the broker.

The above sounds very promising. But it can have scalability and availability issues. And that is what we see in the article by Flipkart. They had implemented the conventional Transactional Outbox pattern. But it needed to be upgraded when issues arose.

Payments data read layer to improve SOA at AirBnB

Unified Payments Data Read at Airbnb

How we redesigned payments data read flow to optimize client integrations, while achieving up to 150x performance gains.

https://medium.com/airbnb-engineering/unified-payments-data-read-at-airbnb-e613e7af1a39

#airbnb #payments #cdc #soa #distributedsystems

When a company moves from a monolithic architecture to a Service-Oriented Architecture (SOA), it is not all bed of roses. Yes, we acknowledge that for large systems, SOA is definitely the solution. But there is no denying the fact that it requires quite a bit of effort to successfully build a service-oriented approach.

Airbnb also had this infamous journey from monolith to SOA. And as a part of the journey, the payments team built a unified payments data read layer, which eased the life of the consumers of the payments-related data.

When Airbnb was a monolith, payment data was not segregated among multiple services. This means the accessibility and the context of the data were simpler. But when different services were created out of the monolith, the services started to own different parts of the data. As a result, when an external service (for example, transaction history for hosts or guests) needs to consume parts of the payment data, it ends up calling many services to fetch the data it requires. As a consumer of the data, this is not very convenient. This is why the team built a unified read layer on top of the payment services. It enabled the smooth onboarding of new services.

The team didn't stop there. As data was segregated, client queries still depended on multiple tables and services. This problem was solved by creating a data denormalization pipeline based on Change Data Capture (CDC) pattern. Data denormalization is helpful to speed up queries, as we know, table joins are costly.

The article nicely explains the evolution of the system with proper diagrams.

Reply

or to participate.