A database is very critical in the performance of an application. Imagine you have developed an ecommerce shop like amazon. The store initially has a single database to create, read and modify data. Everything is perfect. After a couple of years, the business grows and you have millions of customers. What happens when, perish the thought, the database goes down and it’s unavailable? It is a nightmare.
A company like amazon might lose millions of dollars in revenue when the database is unavailable for a only a couple of minutes. Having a single database to serve your customers introduces a single point of failure.
This can easily be mitigated by having secondary databases which takes over any time the primary database is down. Additionally all the database can work together to serve requests.
Database replication is the process of copying data from one database in a server to one or more replica databases in other servers. The databases become synced.
How does database replication work?
Database replication is supported in many database systems, usually with the master/slave relationship. The master or primary database syncs data to the slave/secondary databases. A popular configuration is the master processing data-modifying operations like insert, update and deletes.
The master then syncs the data to the slaves which supports data read operations since applications require a much higher ratio of reads to writes; thus, the number of slaves should always be higher to scale and process the majority of the requests.
In this setup, if the master database goes offline, a slave database will be promoted to be the new master to process data modifying requests and sync data to the slaves. However in production, promoting a new master can be complicated as the data in the slave might not be up to date.
The missing data needs to updated by running recovery scripts. If only a single slave database is available and it goes down, the read operations will be directed to the master database. Immediately after the issue is found, a new slave db will replace the old one and data is synced.
In case we have multiple slave dbs in our architecture, read operations will be directed to the healthy ones.
Types of database replication
We just described a master-slave replication where one database is designated as the master while others are designated as slaves.
The master recieves all the write operations whereas the slaves handles the read operations. Other types of database replication are:
Multi-Master replication: In this setup we have more than one master databases and one or more slaves.The masters receive write operations then the changes gets synced to other databases. A load balancer distributes the traffic to the masters.
Master-Master(Peer-to-peer) replication: All the databases acts as both a master and a slave, the data is replicated across all nodes in a peer-to-peer fashion.
One-Way Replication: In this type of replication, data is replicated from a master database to one or more slave databases in one direction only. This is useful for backup and reporting purposes. This is also known as data mirroring where complete backups are maintained in the case where the primary database fails. Mirrors acts as hot standby databases.
Advantages of database replication?
High Availability - Your system will still be available in case your primary database fails.
Improved Scalability - The traffic is distributed to the replicas. The system can therefore handle a surge in traffic effectively.
Lower latency - Database replication can improve latency by reducing the amount of time it takes to access data. When data is replicated to multiple databases, it can be accessed from a database that is geographically closer to the user, reducing the time it takes to access the data.
Disaster recovery: Database replication can be used to create a disaster recovery site in a separate location. In the event of a disaster, the replicated data can be used to quickly restore operations.
Disadvantages of database replication
Complexity: Database replication can be complex to set up and maintain, especially for multi-master replication scenarios. Configuration, management, and monitoring can require additional resources and expertise.
Consistency: With replication, there is a risk of inconsistent data if updates are made to different copies of the database simultaneously. This can lead to conflicts that need to be resolved.
Performance impact: Replication can have a performance impact on the database, especially during heavy write operations. This is because each update needs to be propagated to all the replicas, which can cause delays and increase system load.
Cost: Replication can require additional hardware and software licenses, which can increase costs. Additionally, maintaining multiple copies of the data can require additional storage and bandwidth.
Security: Replication can increase the risk of data breaches or other security issues. Each copy of the data is a potential target for attacks, so additional security measures may be required to protect the replicated databases.
Final Thoughts
The purpose of database replication is to provide redundancy and fault tolerance in case of hardware or software failure. It also improves application performance by reducing the load on the primary database server and distributing read requests across multiple replicas therefore improving your application performance, scalability and availability.
Additionally, replication can be used for data mirroring where complete backups are maintained in the case where the primary database fails.