Replication is the process of sharing information so as to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility. It could be data replication if the same data is stored on multiple storage devices, or computation replication if the same computing task is executed many times. A computational task is typically replicated in space , i.e. executed on separate devices, or it could be replicated in time , if it is executed repeatedly on a single device.
The access to a replicated entity is typically uniform with access to a single, non-replicated entity. The replication itself should be transparent to an external user. Also, in a failure scenario, a failover of replicas is hidden as much as possible.
It is common to talk about active and passive replication in systems that replicate data or services. Active replication is performed by processing the same request at every replica. In passive replication , each single request is processed on a single replica and then its state is transferred to the other replicas. If at any time one master replica is designated to process all the requests, then we are talking about the primary-backup scheme ( master-slave scheme) predominant in high-availability clusters. On the other side, if any replica processes a request and then distributes a new state, then this is a multi-primary scheme (called multi-master in the database field). In the multi-primary scheme, some form of distributed concurrency control must be used, such as distributed lock manager.
Load balancing is different from task replication, since it distributes a load of different (not the same) computations across machines, and allows a single computation to be dropped in case of failure. Load balancing, however, sometimes uses data replication (esp. multi-master) internally, to distribute its data among machines.
Backup is different from replication, since it saves a copy of data unchanged for a long period of time. Replicas on the other hand are frequently updated and quickly lose any historical state.
Replication in distributed systems
Replication is one of the oldest and most important topics in the overall area of distributed systems.
Whether one replicates data or computation, the objective is to have some group of processes that handle incoming events. If we replicate data, these processes are passive and operate only to maintain the stored data, reply to read requests, and apply updates. When we replicate computation, the usual goal is to provide fault-tolerance. For example, a replicated service might be used to control a telephone switch, with the objective of ensuring that even if the primary controller fails, the backup can take over its functions. But the underlying needs are the same in both cases: by ensuring that the replicas see the same events in equivalent orders, they stay in consistent states and hence any replica can respond to queries.
Replication models in distributed systems
A number of widely cited models exist for data replication, each having its own properties and performance:
- Transactional replication . This is the model for replicating transactional data, for example a database or some other form of transactional storage structure. The one-copy serializability model is employed in this case, which defines legal outcomes of a transaction on replicated data in accordance with the overall ACID properties that transactional systems seek to guarantee.
- State machine replication. This model assumes that replicated process is a deterministic finite state machine and that atomic broadcast of every event is possible. It is based on a distributed computing problem called distributed consensus and has a great deal in common with the transactional replication model. This is sometimes mistakenly used as synonym of active replication .
- Virtual synchrony . This computational model is used when a group of processes cooperate to replicate in-memory data or to coordinate actions. The model defines a new distributed entity called a process group . A process can join a group, which is much like opening a file: the process is added to the group, but is also provided with a checkpoint containing the current state of the data replicated by group members. Processes can then send events ( multicasts ) to the group and will see incoming events in the identical order, even if events are sent concurrently. Membership changes are handled as a special kind of platform-generated event that delivers a new membership view to the processes in the group.
Levels of performance vary widely depending on the model selected. Transactional replication is slowest, at least when one-copy serializability guarantees are desired (better performance can be obtained when a database uses log-based replication, but at the cost of possible inconsistencies if a failure causes part of the log to be lost). Virtual synchrony is the fastest of the three models, but the handling of failures is less rigorous than in the transactional model. State machine replication lies somewhere in between; the model is faster than transactions, but much slower than virtual synchrony.
The virtual synchrony model is popular in part because it allows the developer to use either active or passive replication. In contrast, state machine replication and transactional replication are highly constraining and are often embedded into products at layers where end-users would not be able to access them.
Database replication
Database replication can be used on many database management systems, usually with a master/slave relationship between the original and the copies. The master logs the updates, which then ripple through to the slaves. The slave outputs a message stating that it has received the update successfully, thus allowing the sending (and potentially re-sending until successfully applied) of subsequent updates.
Multi-master replication, where updates can be submitted to any database node, and then ripple through to other servers, is often desired, but introduces substantially increased costs and complexity which may make it impractical in some situations. The most common challenge that exists in multi-master replication is transactional conflict prevention or resolution. Most synchronous or eager replication solutions do conflict prevention, while asynchronous solutions have to do conflict resolution. For instance, if a record is changed on two nodes simultaneously, an eager replication system would detect the conflict before confirming the commit and abort one of the transactions. A lazy replication system would allow both transactions to commit and run a conflict resolution during resynchronization. The resolution of such a conflict may be based on a timestamp of the transaction, on the hierarchy of the origin nodes or on much more complex logic, which decides consistently on all nodes.
Database replication becomes difficult when it scales up. Usually, the scale up goes with two dimensions, horizontal and vertical: horizontal scale up has more data replicas, vertical scale up has data replicas located further away in distance. Problems raised by horizontal scale up can be alleviated by a multi-layer multi-view access protocol. Vertical scale up is running into less trouble since internet reliability and performance are improving.
Disk storage replication
Active (real-time) storage replication is usually implemented by distributing updates of a block device to several physical hard disks. This way, any file system supported by the operating system can be replicated without modification, as the file system code works on a level above the block device driver layer. It is implemented either in hardware (in a disk array controller) or in software (in a device driver).
The most basic method is disk mirroring, typical for locally-connected disks.
Notably, the storage industry narrows the definitions, so mirroring is a local (short-distance) operation. A replication is extendable across a computer network, so the disks can be located in physically distant locations. The purpose is to avoid damage done by, and improve availability in case of local failures or disasters. Typically the above master-slave theoretical replication model is applied. The main characteristic of such solutions is handling write operations:
- Synchronous replication - guarantees "zero data loss" by the means of atomic write operation, i.e. write either completes on both sides or not at all. Write is not considered complete until acknowledgement by both local and remote storage. Most applications wait for a write transaction to complete before proceeding with further work, hence overall performance decreases considerably. Inherently, performance drops proportionally to distance, as latency is caused by speed of light. For 10 km distance, the fastest possible roundtrip takes 67 μs, whereas nowadays a whole local cached write completes in about 10-20 μs.
- An often-overlooked aspect of synchronous replication is the fact, that failure of remote replica or even just the interconnection stops by definition any and all writes (freezing the local storage system). This is the behaviour that guarantees zero data loss. However, many commercial systems at such potentially dangerous point do not freeze, but just proceed with local writes, losing the desired zero recovery point objective.
EMC Replication Manager - Data Replication Software - EMC
Coordinate the entire data replication process with EMC, from discovery and configuration to management of applications consistent disk-based replicas.
SANRAD : GDR-Global Data Replication Software
SANRADb s Global Data Replication (GDR) software provides enterprise-class disaster recovery services for uninterrupted storage operations and full business continuity in the ...
Data Integration Suite - Replication Software - Enterprise Database ...
Sybase business intelligence enterprise information and database management systems present Real-Time Events, a data integration software program. Sybase Real-Time Events delivers ...
Data Replication, database replication, replication software | BlueArc
BlueArc’s software facilitates application appropriate inter and intra server data replication using Accelerated Data Copy or Incremental Data Replication.
Database Replication Software White Papers ( Storage Replicators ...
Read a description of Database Replication Software. This is also known as Storage Replicators, Snapshot Replication Software, File Replication Software, Data Base Replication ...
Database Replication Software Services and Access Database Replication ...
Create and maintain a fault-tolerant replica of your database with database replication software services and access database replication service from Quest Software.
Free File Replication Open Source Data Replication Software
Free data and file replication software that's open source.
Data Replication - Replicate Anything w Free USB Replicator
Free data replication that's easy to use and very effective - software is fully functional.
Data Replication Software works in cloud environment., Netex
NetEx Eliminates Cloud Storage Data Transfer Recovery Bottlenecks with HyperIP for Cloud, Netex
Data Replication Software integrates with SQL services., Attunity Inc.
Attunity Introduces Suite of Operational Data Replication ODR Solutions Integrated with Microsoft SQL Server Integration Services SSIS, Attunity