The Road to SDE II

6 min read

After a year of being a full-time software engineer at OfferUp, I’m excited to announce I’ve been promoted to SDE II.

Here are some of things I worked on over the past year:

Migrating List Service Off Monolith

List service is a microservice we use for customers’ saved lists in the app.

Like an item you see? You can click the ❤️ icon and save it to a list.

From an implementation standpoint List service is a CRUD app.

Over 2020 and into Q1 of 2021, one of OfferUp’s major projects was to migrate off its monolith and move completely to a microservice architecture.

OfferUp blue bear punching a big ball of mud that is the monolith.

I was responsible for migrating the List service off the monolith.

I needed to change the service from calling the monolith endpoints to calling the newer microservices with 0 customer downtime.

Unfortunately the way List service was written, the shape of the monolith response was tightly integrated throughout the codebase.

I thought a large refactor would hold a large risk. Here’s what I did instead to minimize my footprint.

I applied the facade pattern.

I call each microservice. Once the microservices respond, I transform the shape of their response into the shape that was originally returned by the monolith service.

The only place I needed to change was where the APIs were called.

Once the microservice responses were transformed into the shape of the original monolith API response, I could send the object down happily through the system.

I also set up the change so that it could be rolled out slowly to customers as an A/B test.

I ramped up my backwards-compatible change and reduced average latency from 205 ms to 135 ms, a 34% decrease.

ElasticSearch Query Optimization

In Q1, I implemented an optimization to our ElasticSearch query for home-feed browse.

The optimization is simple: Only look at listings that have been posted within that last 120 days instead of an exhaustive search over the entire catalog.

From of business perspective, we justified this decision based on the sheer volume of inventory that gets posted every day.

We traded off inventory size to improve latency.

I made the change and brought search latency from 1.2 to 0.8 seconds, a 33% decrease.

List Query Optimization

On the app, we have this cute ❤️ icon on the item detail screen (mentioned earlier in this post).

You click the icon, and you can save an item to one of your lists.

If the item is saved, the heart is filled in: ❤️

Otherwise, it appears as an outline of a heart: 🤍

In order to get the boolean state of knowing whether to fill in the heart or not, we had somewhat of an N+1 problem going on.

We were over-fetching data on the backend, causing our Listing service (not to be confused with the List service) to be called unnecessarily.

This put a lot of unnecessary strain on the Listing service as the item detail page is one of the most high-traffic screens in the app.

I created an optimized endpoint that can skip calling Listing service.

The new endpoint didn’t have any service dependencies. It works in isolation.

I set up an A/B experiment to slowly ramp up the change in production.

Once the change was deployed fully, we saw the peak number of req/s on the Listing service decrease 40% from 2.3k to 1.4k req/s.

Fixing Sneaky Authentication Middleware

We had an issue where a small set of users were consistently seeing errors whenever accessing their saved lists.

I dug into the issue and discovered there was a middleware sitting between GraphQL and the List service.

Upon further investigation, I learned this middleware was designed to parse the JWT and set some headers before forwarding the request to List service.

What happened was this:

  1. GraphQL would call List service via a middleware proxy.
  2. The middleware proxy would fail to parse the JWT or set the headers.
  3. List service would respond to the proxy with 401 (unauthorized).
  4. The middleware would respond to GraphQL with 503 (service unavailable).

Then of course the 503 would show up on the client’s device as an error.

I worked with DevOps to stand up a new auth middleware that would replace the legacy one.

We deployed the new middleware, then I made the changes to point GraphQL to it. Then we deployed GraphQL.

I confirmed with a customer experiencing the issue and they said the error went away. 🎉

Moreover, we saw the change decreased the average number of ELB 4xx responses from List service by 66%.

Migrating Provider from ElasticSearch to Algolia

Starting in Q2, the team looked to pivot to Algolia to succeed ElasticSearch in order to cut costs, improve relevance, and improve DX.

I was tasked to create a proof of concept; we wanted to ramp up 20% of our traffic for simple keyword searches to test the idea.

I made the code changes, set up the experiment, and deployed the experiment.

As a result, the team decided to move forward with Algolia to succeed ElasticSearch.

I continued implementing all the remaining use-cases in search: filters, sorts, home-feed browse, category browse, and a handful of other domain-specific features.

We plan to have Algolia ramped up to 100%, fully replacing ElasticSearch in Q3.

OfferUp is Hiring

I believe OfferUp is a great place to learn.

Working at OfferUp has allowed me to have a lot of ownership with the services I help build early in my career.

If you want an opportunity to get your feet wet in a lot of different technologies and work in a fast-paced environment, then OfferUp could be a great place for you.

We’re hiring engineers to work in office or remote (in select US states).

If you’re knowledgeable about one or more of:

  • Java backend microservices
  • GraphQL with Apollo and Node.js
  • React.js and React Native

Then I encourage you to check out our engineering career page or reach out to me directly. ✌️