


The traditional approach to database and schema design is based on the fallacy that data must be written in the same form as it will be queried. This idea is sometimes known as command query responsibility segregation (CQRS). For this reason, you gain a lot of flexibility by separating the form in which data is written from the form it is read, and by allowing several different read views. Storing data is normally quite straightforward if you don’t have to worry about how it is going to be queried and accessed many of the complexities of schema design, indexing, and storage engines are the result of wanting to support certain query and access patterns (see Chapter 3). This series of articles assembles presents a summary of the topics in the book which left the biggest impact on us, altering our approach to architecture and providing us confidence in the decisions we made.Īdvantages are gained from decoupling the initial task of storing data from the task of processing and storing it in a format that is optimized for specific queries. The book precisely describes the exact problems that kept our team up at night, provides recognizable names to identify these concepts, and discusses various solutions and their tradeoffs. I found this clarity while reading the book Designing Data-Intensive Applications by Martin Kleppmann. We must learn to rely on new language, ask new questions, and assemble a new toolbox for architecting resilient and fault-tolerant applications.

In response, we have to shift our mindset when designing and debugging these systems.

In distributed systems, especially ones where data is flowing throughout and being stored redundantly into multiple databases, our assumptions and instincts - which we take for granted from our experience working in traditional systems - break. When solutions were proposed, they were sometimes rejected as non-standard, or too complex. We did not even know what to call the problems we encountered, much less how to solve them. In the early days of our transition from a monolithic application into a more distributed system, built on technologies like Kafka, gRPC, and Spark, we often felt we were fumbling around in the dark.
