• Biweekly Engineering
  • Posts
  • Evaluation of ML Feature Store at DoorDash - Biweekly Engineering - Episode 15

Evaluation of ML Feature Store at DoorDash - Biweekly Engineering - Episode 15

How DoorDash built its ML feature store - Anti-Patterns in system development

Hey hey dear subscribers! Welcome back to the 15th episode of Biweekly Engineering - your favourite newsletter that serves curated articles from numerous software engineering blogs out there in the wild.

Today, I have picked up stories for you from Expedia and DoorDash. Let’s go! đźš— 

Maria-Theresien-Platz in Vienna, Austria

Anti-patterns in backend systems

A wonderful post from Expedia Group Tech. The article discusses six anti-patterns commonly found in backend systems that are hurting your product. Let me summarise.

Anti-pattern 1 - The mixed domain

  • No clear boundaries between systems, APIs, and interfaces. For example, a pricing logic contains payment processing.

  • The code and the business logic is scattered across multiple domains or systems.

Anti-pattern 2 - Customer specific domains

  • Multiple teams having different customer segments develop the same systems to serve their customers. Example: separate pricing service for separate products like hotels or flights. There should be one pricing platform serving all different products.

  • A lot of code and design duplication in the overall system.

Anti-pattern 3 - Mixing user experience and domain logic

  • In this anti-pattern, different frontend clients end up having different set of features. As a result, a mix of frontend and backend logic exists in the system.

  • This anti-pattern is in fact pretty common to encounter. For instance, many times we see that some features are available in the website, but not in the app, or vice versa.

Anti-pattern 4 - Avoid extra and duplicative domains

  • Multiple systems are built that serve similar purpose which should be a single unified system.

  • This anti-pattern appears in cases when there is no clear ownership boundary, or as a result of mixing experience and domain logic. So this anti-pattern could be an outcome of the previous anti-patterns.

Anti-pattern 5 - Orphan domains

  • As the name suggests, orphan domains occur when there is no clear ownership of a system.

  • In this anti-pattern, logic can also get scattered into multiple systems.

Anti-pattern 6 - Non customer centric APIs

  • Failure to create APIs that are targeted at customers’ needs.

  • It is common to see engineering teams unable to design customer centric APIs as they are secluded from the customers.

The above is a very summarised view of the article. For a good understanding, don’t forget to check it out.

Evaluation of ML feature store at DoorDash

One of the primary needs of running machine learning models at scale is to build highly scalable feature store. In DoorDash, such a store was built on top of Redis as the storage layer.

In brief, DoorDash had three requirements from the storage system:

  • Store billions of records

  • Support millions of lookups per second

  • Enable full data refresh in batch

Of course, deciding on such a store requires rigorous evaluation. You wouldn’t want to take the decision without benchmarking well. And that’s exactly what DoorDash did.

In the article, we can experience how DoorDash evaluated the decision for a storage layer. They ran specific set of benchmarks on a specific set of storage systems, gathered the metrics from the benchmarks, and finally, settled on Redis.

After reading the article, the first thought came to my mind was about the cost. Redis is an in-memory cache that is also frequently used as a database for single digit milliseconds of latency. But in-memory database systems are supposed to be very costly, as RAM is costlier than disks.

The authors claim in the article that cost would be less in Redis compared to the second best storage system (CockroachDB) due to less CPU usage in Redis. But what if the growth skyrockets? What if more and more team start to create new features and want to store them in the feature store? What if there are high amount of high-cardinality features? Could there be a missing piece in the benchmarks that doesn’t show the full picture?

Well, I don’t know that, but turns out, as of March 2023, DoorDash later migrated half of their feature store to a CockroachDB to save 75% of cost!

Indeed, cost became a major concern with the increasing need of storing more and more features. The article discusses why the team decided to migrate to CockroachDB from Redis. It also shares some performance optimisations and the results that reduced cost even more for CockroachDB.

And that marks the end of today’s episode! Go ahead and read the posts shared, pause and ponder over them, and try to map them with your own experiences.

Have a great week ahead! See you soon. 🙂

Do you find the newsletter useful? Then feel free to share this within your community: https://biweekly-engineering.beehiiv.com/subscribe

Reply

or to participate.