If you’re thinking about implementing real-time analytics, you’ve probably realized that you’re going to need real-time updates. Real-time updates give you the power to insert, delete and update data in place. To do that, you’ll need something more: a mutable database.
In this post we’ll discuss the three main reasons why a mutable database is required for real-time updates.
1) Late Arriving Data in Time-Based Window Rollups ⌛️
Let’s say you have a rollup that’s counting events for each hour. All of a sudden, an event comes in the next day that was supposed to be processed yesterday. Now, you need to update all the aggregates and rollups that were done yesterday. A mutable database allows you to recompute the results with the late-arriving data.
In some systems, once the time window has passed, the rollups become read-only. You can’t update the rollups because they’re in a non-mutable store. Other systems create new tables for late arriving data. In those cases, your app will have to know which tables have the late-arriving data so you can do the aggregation at query time. In these circumstances, there’s more manual work involved and it’s time-consuming.
2) Data Enrichment 🔠
Analytics involves a lot of enrichment. If you have a data record that you want to tag as spam, it’ll be easy to just insert a field in your data record with that tag. A mutable database allows you to do this.
If your events are read-only, you have to write all the enriched-tags in a different place. Now, your app has to look at two different places to correlate the tags with the right events at query time. This can increase the complexity of the application code.
3) Staying in Sync With Changes From a Transactional Database 🤹
Let’s say you have many users who have updates in the transactional database. Your analytical database needs to be in sync with these changes. If your analytical database is mutable, you can easily update the changes in place.
If you’re not using a mutable analytical database, you’ll probably batch download the whole transactional database into your analytical database once a day. This has a higher operational overhead, and you also lose the real-timeliness.
Using a mutable database makes developers’ lives a lot easier by simplifying the workflow, allowing you to iterate faster. You can easily do rollups on late-arriving data, add fields to your document to enrich the dataset and be in real-time sync with your transactional database.
You might be thinking, well OLTP databases like MongoDB and PostgreSQL are mutable. They are! However, they may get hung up on analytical queries where you need to scale out to large data sizes.
If you’re considering a real-time analytical database to handle complex analytical queries and scale you have a couple of options. Druid and ClickHouse are immutable databases that work for append-only. If you need a mutable real-time analytical database, Rockset is a great choice to handle the situations described.
Rockset is the real-time analytics database in the cloud for modern data teams. Get faster analytics on fresher data, at lower costs, by exploiting indexing over brute-force scanning.