Microservice Best Practices | Biweekly Engineering - Episode 26

Machine learning platform at Shopify | Best practices for building microservices | Spotify's experimentation platform

Good greetings! Welcome back to the 26th episode of your most favourite engineering newsletter - Biweekly Engineering!

In today’s episode, we have articles from Shopify, Capital One Tech, and Spotify.

Let’s begin!

The British Museum

Merlin - ML Platform at Shopify

The first article we have today is from Shopify Engineering. It introduces Shopify's machine learning platform named Merlin.

Merlin is designed to help data scientists and software engineers streamline their machine learning workflows by providing a scalable, flexible, and easy-to-use platform. Merlin is built on an open-source stack and uses Ray, an open-source framework for distributed computing.

In summary, Merlin enables three key areas:

  • Scalability: Robust infrastructure that can scale up the machine learning workflows.

  • Fast Iterations: Tools that reduce friction and increase productivity for the data scientists and machine learning engineers by minimising the gap between prototyping and production.

  • Flexibility: Users can use any libraries or packages they need for their models.

Shopify developed Merlin in a layered architecture:

  • A data ingestion layer: This layer is responsible for ingesting data from various sources, such as data lake, feature storage, Spark jobs, etc.

  • A training layer: This layer is responsible for training machine learning models which is powered by Ray.

  • An inference layer: This layer is responsible for deploying and serving machine learning models.

  • A monitoring layer: This layer is responsible for monitoring the performance of machine learning models.

For the teams at Shopify, Merlin makes it easy to prototype a machine learning project, iteratively develop it and eventually deploy it into production. The articles discusses the whole process how Merlin is used to prototype and deploy real-life machine learning use-cases with an example.

Microservices Best Practices - Learnings from Capital One Tech

How to design microservices well? What best practices to follow? How to avoid falling into the pitfall of distributed monoliths? These are the questions you need to have answers for before taking the journey towards microservices world.

Capital One Tech shared 10 best practices in this article. Let’s summarise them:

  1. Follow Single Responsibility Principle (SRP).
    The first advice is pretty generic to software engineering as a whole. Just like classes in object-oriented programming, microservices should be built to serve a single responsibility and should do it well.

  2. Make a microservice owner of its own data. 
    A microservice should own its own data and be the source of truth for the data. Don’t keep a monolith database that everyone can access.

  3. Take advantage of asynchronous communication for loose-coupling.
    If synchronous communication is not needed, decouple microservices through asynchronous communication to increase failure resilience and scalability.

  4. Use circuit breaker.
    This is a popular microservice pattern where a the “circuit” between two services are broken when the callee service fails or timeouts. Through a well defined configuration, service owners can decide when the circuit breaker should be kicked in. This ensures the SLA of the caller service remains intact for its clients.

  5. Use API gateway.
    Your microservices should sit behind an API gateway which can solve use-cases like authentication, rate-limiting, security, etc.

  6. Build API changes as backward compatible.
    Make sure that when you bring in API changes, they are always backward compatible. Otherwise clients will end up seeing their code breaking unexpectedly.

  7. Version your APIs.
    When API changes are bound to be breaking, create a new version of the API and let clients migrate to the new version gradually.

  8. Have proper infrastructure for the microservices.
    Well-built microservices should be hosted on well-built infrastructure. Thanks to cloud, it’s now very easy to have reliable infra with reasonable cost.

  9. Build separate deployment process.
    Make sure each microservice in your system can be independently released.

  10. Create organisational efficiencies.
    With freedom to develop, teams building microservices might do duplicated work or derail from best practices. It’s important to have organisational structure in such a scenario to make sure teams are efficient and resourceful.

Experimentation Platform at Spotify

Experimentation platform is pretty common in many companies. The idea is to run experiments where a new feature for a product is exposed to a specific set of users and the impact is measured.

Spotify's newer experimentation platform is a significant upgrade from its predecessor. The old platform was limited by its 1-1 mapping between experiments and feature flags, which made it difficult to restart experiments and caused performance problems. Additionally, the old platform only provided a small number of out-of-the-box metrics, which led data scientists to perform analyses in notebooks, and it was time-consuming and inconsistent.

The new platform is composed of three parts: Remote Configuration, Metrics Catalog, and Experiment Planner.

  • Remote Configuration is used to change the experience a user receives by controlling the values of a set of "properties" of the client.

  • Metrics Catalog is used to manage, store, and serve metrics to the Experimentation Platform.

  • Experiment Planner is used to create, launch, and stop experiments, as well as analyse test results.

The new platform is a step change in ease of use and capabilities. It is more flexible, scalable, and easier to use than the old platform. Spotify is still evolving the platform, but it is already being used to run a wide variety of experiments.

Here are some of the key benefits of the new platform:

  • Flexibility: The new platform allows for more complex experiments than the old platform.

  • Scalability: The new platform can handle a much larger volume of traffic than the old platform.

  • Ease of use: The new platform is much easier to use than the old platform.

I personally felt this is a good example of how an experimentation platform is designed. There is a part 2 of the article too:

And that’s a wrap for today! I hope you enjoyed this episode. Until the next one, adios! 👋 

Reply

or to participate.