Announcing Our Real-Time Data Engine

Announcing Our Real-Time Data Engine

Welcome to the biggest update in the history of Pragli. Hopefully, you barely even noticed it.

We've spent the past 6 months developing our own real-time sync engine, and it's finally live!

TLDR

  • The new backend will improve scale, stability, performance, and capability
  • It will not solve all performance issues on day 1 - more work is being done!
  • Please let us know if you experience any new issues

Background

Previously, we used Firebase real-time database (RTDB) as our backend. RTDB worked reasonably well for a long time, but we eventually needed to move off of it because:

  1. Scale - RTDB only supports ~1k writes per second
  2. Stability - RTDB occasionally has blips that disconnect the entire system and we couldn't be sure if it was Firebase's fault or our fault
  3. Stability Pt 2 - Even when system-wide events weren't happening, there were many cases where individuals would get knocked off of RTDB and we had no way of diagnosing why
  4. Performance - RTDB doesn't have a clean way of passing events to backend systems, other than through functions, which are super slow. As a result, our backend services would functionally clone the entire DB into memory and then subscribe to system-wide events. These services would take 20 minutes to come up, meaning that any issue would knock us out for at least that long. They were also expensive, as Firebase would have to send all of this bandwidth to them. Finally, they were super clunky and unintuitive to work with.
  5. Capability - For the same reasons as performance, there were certain backend things that were practically impossible on top of RTDB.

Options We Considered

Firebase Firestore

Firestore is Firebase's newer RTDB alternative. It scales 10x higher to 10k writes per second. It's also a bit cleaner to work with. However, it doesn't have a presence system built in, so we'd have to engineer that ourselves. Furthermore, 10x is only kicking the can so far down the road. The benefit is that it would have been easier, but it still would have been an enormous migration, so it wasn't enough to justify our other concerns.

Open-source sync engines

We evaluated many open source projects, but none were mature enough for us to switch.

Build it in house

This was the hardest route, but none of the others were particularly easy. Many great future of work companies had success going down this route (including Asana, Linear, Figma, and many others), and there were many online resources to direct us. As a result, we ended up opting for this route.

How It Works

How our real-time data engine works between Client and Server

Our sync architecture is written entirely in TypeScript and has a few primary components:

  • In-memory client data store
  • Custom client API / Hook layers
  • GraphQL API
  • WebSocket "sync" server
  • Postgres database
  • Task runner on top of Redis

Data flow is initiated from React hooks, which tend to look like this:

const room = useSyncObject(Room, roomId)

This gives the front end developer a real time view of the room object.

Behind the scenes, this hook is checking with the data store to see if a live view of the object is already loaded; if so, it just reflects that view. If not, it hydrates the data from the GraphQL API and registers a new subscription on that object in the sync server.

Writes are done generically via a call that looks like this:

writer.save(new Room({ id: roomId, name: newName }))

The writer eagerly updates the state in the local store, and then fires off a GraphQL mutation. If the mutation fails, the local write is undone, and a message is shown to the user. If it succeeds, the database is updated and a new sync event is passed through the DB to the sync server. That sync event is then routed to all clients who are subscribed on that object. All sync data and GraphQL queries/mutations are permissioned.

We also support what we call "collection" subscriptions - eg something that looks like this:

const messages = useSyncCollection(Message, 'discussionId', Discussion, id)

Read this is as "Give me all of the messages with a foreign key to the discussion with id id." It's essentially a live view of all objects that have a particular foreign key value in the DB. It similarly hydrates via a GraphQL query and is then updated via a subscription that is sent to the sync server.

Benefits

Visibility becomes stability: We now have total visibility and control over our entire sync infrastructure, a huge benefit in itself. This will allow us to diagnose customer issues and reach higher levels of stability.

Performance: certain operations that previously fired off "tasks" by subscribing to the entire firebase RTDB are much faster now. Try joining a room with a colleague - you'll notice that the meeting loads significantly faster.

Scale: this architecture will scale much better. As is, we'll comfortable operate in the thousands of writes per second, and we expect to easily climb into the tens of thousands with a few tweaks. Longer term, we have architectural plans that will allow us to go much higher.

Let us know what you think

Let us know your thoughts, and please let us know if you find any bugs. We're available on Twitter at @PragliHQ, via our in-app support chat, or you can reach me directly on Twitter at @dougsafreno.

Credits & References

Show Comments