Disaster Recovery SolutionsThe disaster recovery solution enables the use of remote standby servers to take over an application in case of server or application failure. Data is replicated in real-time to the remote site by using data replication software or storage devices. LifeKeeper sits on all servers in the cluster monitoring the health of the systems and active applications. When a failure is detected, recovery procedures are started automatically. There are currently two options available for disaster recovery:
A basic 2-node solution can provide offsite disaster recovery and real time backups, the more sophisticated 3-node solution adds high availability and local failover for those not-quite-disasterous occasions. The 3 node is preferable where budget permits as it allows easier recovery for non-disaster related failures, and high availability against WAN failure and is therefore a cleaner overall solution. 2-node disaster recoveryData replication is used to maintain separate identical local copies of the application data on the two servers. With the application active on the primary server, all updates to the application data are automatically replicated to the standby server. When a failure occurs, the application is automatically started on the standby server, it continues its operations using a mirrored copy of the data. If the primary server is returned to service, the direction of the data replication can be reversed, and after an initial resynchronization process to bring the primary server up-to-date with any data changes which may have occurred while it was unavailable, returned to front line service. There is no need to copy the entire disk when a server is returned to service, only changed data is replicated and there is no need for white space replication with LifeKeeper Data Replication. This allows for multi-gigabyte stores to operate over relatively low speed connections. Failover does not affect clients When the application migrates to the standby server, LifeKeeper also migrates IP addresses and hostnames ensuring that clients normally do not even notice the failover, and at worst are simply required to reconnect to the server to restablish their session. Advantages of the 2-node approach
3-node Disaster Recovery Solution - combining local recovery and disaster recoveryUsing the ability to replicate to more than one server at a time, data can be replicated to both a local server and a remote server. Therefore, when there is a failure in the active application or server it causes the local standby server to run the application with minimum disruption to the users and clients. The remote standby is available for use when a site-wide disaster occurs. However, a failure at the site will cause the server at the remote site to run the application. The application is active on the primary server, and is also configured as a local standby server. Application data is also being replicated to the remote system. The result is a 3-node cluster, consisting of two local systems, and a third remote system receiving data updates via data replication over a wide area network (WAN).
When the local application or server fails, the local standbt server takes over the application. When the application is unable to run on both local servers, it will run on the offsite server. When a site failure occurs The application data is already on the remote server so the application is able to migrate to the remote site with little or no disruption to the users. This migration can be automatic or manual. Return to service When the local server is returned to service the direction of the data replication is reversed automatically from the server that is currently active (either the local or remote). The main active server is switched back to being active as soon as the data replication is complete. Advantages of this approach
Original Source: www.openminds.co.uk
|