The need for Reliable Scalable and Maintainable Applications.

Reliable Scalable Maintainable Applications

"The internet was done so well that most people think of it as a natural resource like the Pacific Ocean, rather than something that was man-made. When was the last time a technology with a scale like that was so error-free?" - Alan Kay, Dr Dobb's Journal (2012)
In today's technological landscape, many applications prioritize handling large amounts of data over intensive computation. As a result, raw CPU power is rarely a limiting factor for these applications.
However, the main challenges faced by these data-intensive applications are typically related to:
Data Amount: Managing and processing large volumes of data efficiently.
Data Complexity: Dealing with the intricacies and diverse formats of the data.
Data Velocity: Handling the speed at which data changes.
Data-intensive applications are commonly built using standard building blocks that provide essential functionality. For example, these applications often require:
Data storage capabilities through databases to ensure data availability when needed.
Caches to store and retrieve frequently accessed data, enhancing read performance.
Search indexes for enabling efficient data searching and filtering based on keywords and various criteria.
Stream processing systems to handle asynchronous message handling between processes.
Batch processing systems for periodically processing large volumes of accumulated data.
These data systems have become successful abstractions that are widely used without much consideration.
When building an application, it is not necessary to reinvent the wheel by creating a new data storage engine from scratch. Databases serve as excellent tools for data storage and retrieval.
However, in the real world, things are not always so straightforward. There are numerous database applications with different characteristics and requirements. Various approaches to caching, search indexing, and more exist.
When building an application, it is crucial to select the right tools and approaches that suit the specific requirements at hand. In some cases, it may be challenging to achieve the desired functionality with a single tool alone.
Thinking About Data Systems
We commonly categorize tools like databases, message queues, and caches as different entities. Although databases and message queues both store data, they have different access patterns, resulting in distinct performance characteristics and implementations.
However, recent years have witnessed the emergence of new tools for data storage that blur the boundaries between traditional categories. For example:
There are data stores, like Redis, that are also utilized as message queues.
Message queues like Kafka provide database-like durability guarantees.
These tools are optimized for different use cases and no longer neatly fit into traditional categories. The lines between these categories are becoming increasingly blurred.
Many modern applications have extensive and diverse requirements, making it challenging for a single tool to fulfill all data processing and storage needs. Consequently, work is broken down into tasks that can be efficiently performed by individual tools, and these different tools are stitched together using application code.
For instance:
An application may incorporate an application-managed caching layer using tools like Memcached or similar systems.
It may also involve a separate full-text search server, such as Elasticsearch or Solr, alongside the main database.
The responsibility of keeping caches and indexes in sync with the primary database often falls on the application code.
When multiple tools are combined to provide a service, the service's interface or API typically hides the implementation details from clients.
This integration of various tools creates a new special-purpose data system composed of smaller, general-purpose components.
The composite data system may offer certain guarantees, such as correct cache invalidation or update propagation on writes, to ensure consistent results for external clients.
Building such a system requires not only application development skills but also expertise in designing data systems.
When designing a data system or service, several challenging questions arise, including:
How can data remain correct and complete even when internal failures occur?
How can consistent performance be provided to clients, even when certain parts of the system experience degradation?
How can the system scale to handle increasing loads?
What does an effective API for the service look like?
The design of a data system is influenced by various factors, such as the skills and experience of the people involved, dependencies on legacy systems, the time scale for delivery, an organization's tolerance for different kinds of risks, regulatory constraints, and more.
In most software systems, three primary concerns hold significant importance:
Reliability: The system should continue to function correctly, meeting desired performance levels, even in the face of adversity caused by hardware failures, software errors, or human mistakes.
Scalability: As the system grows in terms of data volume, traffic, or complexity, there should be reasonable ways to handle that growth, ensuring the system's performance and availability are not compromised.
Maintainability: Over time, different individuals will work on the system, and they should be able to work on it productively, understanding and enhancing its functionality without introducing unnecessary complications.
These terms—reliability, scalability, and maintainability—are commonly used, but it is crucial to have a clear understanding of what they truly mean and how to address them in the design and implementation of data-intensive applications.
Credits:
Image by Freepik
Reference from the book Designing Data-Intensive Applications by Martin Kleppmannhttps://martin.kleppmann.com/)