When you start discussing distributed architecture, there is high possibility that you will encounter this CAP theory (or Brewer’s theorem). CAP stands for Consistency, Availability and Partition tolerance. It wants system designers to make a choice between above three competing guarantees in final design. It’s said that achieving all 3 in system is not possible, and you MUST choose at most two out of three guarantees in your system.
OK. Definition seems simple and quick. But wait !! What do you mean by Consistency, Availability and Partition tolerance?? Let’s define these terms in distributed computing environment.
A service that is consistent operates fully or not at all. Consistency is ‘C’ in ACID properties in non-distributed systems, as applied to the ideal properties of database transactions and means that data will never be persisted (rolled back) that breaks certain pre-set constraints.
Consistency, in distributed environment, means all client programs who are reading the data from the cluster see the same data at any given point of time. Two clients fetching data from two nodes should not see different data at all.
Availability means – you should be able to retrieve the data you stored in distributed system, no matter what happens inside cluster. If you make a request, then you must get a response from system; even if a node (or many nodes) in cluster goes down.
Partition Tolerance means that the cluster (as whole) continues to function even if there is a “partition” (communications break) between two nodes (both nodes are up, but can’t communicate).
No set of failures less than total network failure is allowed to cause the system to respond incorrectly.
So the CAP theorem makes you to choose between any two of 3 guarantees, you wish to add into your system. You cannot guarantee all 3 in single distributed system e.g. in order to get both availability and partition tolerance, you have to give up consistency.
CAP Theorem Example
Whatever you read above is mostly definition part, and somehow complex for beginners; so let’s understand in more simple English.
You are asked to design a distributed cluster of 4 data nodes. Replication factor is 2 i.e. any data written in cluster must be written on 2 nodes; so when one goes down – second can serve the data. Now try to apply CAP theorem on this requirement. In distributed system, two things may happen anytime i.e. node failure (hard disk crash) or network failure (connection between two nodes go down) [Fallacy of distributed computing].
We will try to design our system based on these facts/assumptions.
CP [Consistency/Partition Tolerance] Systems
In distributed system, at the time of reading the data, consistency is determined by a voting kind of mechanism, where all nodes who have copy of data mutually agree that they have “same copy” of requested data. Now let’s assume that our requested data is present in two nodes N1 and N2. Client tries to read the data; and our CP system is partition tolerant as well, so an expected network failure occurred and N2 is detected as DOWN. Now system cannot determine that N1’s data copy is latest or not; it may be stale as well. So system decides to send an ERROR event to client. Here system chose to prefer data consistency over data availability.
Similarly, at time of writing the data if replication factor is 2, then system may reject write requests until it finds two healthy nodes to write data fully in consistent manner.
AP [Availability/Partition Tolerance] Systems
What if in above scenario, system instead of sending ERROR (in case N2 is down); it sends the data recieved from N1. Well client got the data, but was it latest data copy stored in system in past?? You cannot decide. You chose availability over consistency. These are AP systems.
CA [Consistency/Availability] Systems
In a distributed environment, we cannot avoid “P” of CAP. So we have to choose between CP or AP systems. If we desire to have a consistent and available system, then we must forget about partition tolerance and it’s possible only in non-distributed systems such as oracle and SQL server.
CAP Theorem is like the old joke about software projects: you can have it on TIME, in BUDGET, or CORRECT. Pick any two 🙂