- Biweekly Engineering
- Posts
- Scalable Key-Value Store at Dropbox - Biweekly Engineering - Episode 13
Scalable Key-Value Store at Dropbox - Biweekly Engineering - Episode 13
How Dropbox built its own key-value store - ML system development at Airbnb - How generative AI is transforming work for developers
Greetings dear Biweekly Engineering subscribers! It is my absolute pleasure to welcome you all to the 13th episode of the newsletter!
Today, as usual, we have three very insightful articles to read.
Let’s just start.
In the Austrian Alps
An example of ML development process at Airbnb
#airbnb #machinelearning #ai #searchsystems
Today, we start with a well-written and detailed article on the development process of a machine learning system at Airbnb for their Airbnb Experiences product. The story shows how the development of ML systems can begin with a small amount of data and scope, and gradually evolve into full-fledged robust systems with more data and business growth.
Airbnb Experiences is a product that allows users to book different experiences such as day trips or fun activities. When Airbnb launched the product, they started collecting user behavioral data. Based on this initial small amount of data, they decided to build the first version of the search ranking feature.
The idea behind the search ranking feature is to order the search results for a user in a way that the most relevant and likely-to-book search result appears at the top.
In the first version of the search ranking, the data size was small, and only a set of features from the experiences were considered to train the ML model. This was an offline model where every user would see the same ranking, as user-specific features were not considered for the decision.
In the second version, the data size grew, and user-specific features were added. This introduced personalisation in the model, meaning that different users would now see different rankings while searching for experiences. This was a great improvement, but the model was still an offline one. No online features were used.
In the third iteration, the team added query features to the model, which turned the search ranking model from an offline model to an online one. Now, query features such as city, country, group size, browser language, etc. could be leveraged on the fly to provide an even more personalized ranking to the users.
As the article neatly points out, each iteration of the development process was run as an experiment to track the impact of the changes. The experiments showed how the number of bookings was increasing due to the iterative development stages.
Overall, the article is a wonderful resource for understanding a fundamental principle of development of machine learning systems: iteration.
How Generative AI is transforming the way developers work
#github #machinelearning #ai #generativeai #chatgpt #llm
Surely, you are familiar with the buzz surrounding ChatGPT. It is not difficult to say that the way we developers work has been forever changed due to the groundbreaking success of Large Language Models (LLMs).
Developers are increasingly using Generative AI-powered tools like Github Copilot to increase their productivity. But how are these tools being used? This blog post provides a good overview.
Generative AI-based tools are emerging everywhere, from writing to images to coding. In the coding environment, these tools can now be used as extensions, so when you write the code, the tool continuously picks up hints and suggests accordingly.So how is the impact so far? As mentioned in the blog:
“…we found that developers who used GitHub Copilot coded up to 55% faster than those who didn’t. But productivity gains went beyond speed with 74% of developers reporting that they felt less frustrated when coding and were able to focus on more satisfying work.”
Personally, I have been using ChatGPT to increase my productivity, and the experience has been amazing! Don’t worry, I am not using ChatGPT to write this newsletter!
The story of a scalable key-value store at Dropbox
#dropbox #keyvaluestore #scalability #distributedsystems
As the last story for today, we have an excellent article from Dropbox.
Dropbox is a cloud file storage service, so it needs to handle an enormous amount of metadata. They reconsidered the design of their existing metadata systems to provide higher scalability. To achieve this, they decided to revamp their existing metadata storage systems and build something new on their own.
This led to the birth of Panda, a petabyte-scale transactional key-value store. It is built on top of sharded MySQL storage servers with support for ACID transactions. In this article, readers get a high-level overview of Panda, including its architecture.
Why did Dropbox build this themselves? Couldn't they just use some existing solution? Well, to know this, you need to read the article. The authors discussed possible options that they considered and why they couldn't choose any of them.
And that wraps up today's episode! Make sure you don't miss out on those articles, and feel free to share them with your friends. Catch you all in two weeks! Keep smiling and stay awesome! ☀️
Reply