Extending MSCS Cluster High Availability Offsite
Microsoft Cluster Service — Extending their Protection to Off-Site Replicas
Microsoft Cluster Service (MSCS) shared storage clusters provide excellent protection from server failures and even application failures in many cases. Due to the significant levels of protection provided by MSCS shared storage clusters it's easy to overlook a key point. MSCS shared storage clusters are only one layer of protection in your overall data protection strategy. This type of clustering is unable to protect from data corruption, storage device failure or site/subnet level failures. Examples of such failures might include virus outbreaks, human error, network failures, storage device failure, power outages, server room overheating, fire or flooding and even natural disasters.
Extending your MSCS cluster offsite with a cluster aware high availability solution will close the gap in protection left uncovered by MSCS shared storage clusters. Depending on your specific requirements "offsite" can be 10 miles, 200 miles, 1000 miles or even further. Additionally, by utilizing a solution with integrated CDP technology, you can also protect from data corruption, including damage done by viruses or human error.
The Cost of Downtime — even with cluster technology
At this point you have, or are seriously considering, Microsoft Cluster technology. Due to the non-trivial expense associated with implementing and maintaining Microsoft Clusters, we can safely assume downtime is costly to your organization. MSCS shared storage clusters are only one layer of protection in your overall high availability strategy. Having the ability to quickly failover offsite is simply the next logical layer of protection to prevent crippling downtime.
The most important benefit to extending protection offsite is that you now have true disaster recovery and business continuance capabilities. While the risk of such a major disaster might be relatively low vs. the more common failures clusters typically protect against, the impact is also much higher. According to a study by accounting firm McGladrey and Pullen, 43% of companies that experience a disaster never re-open and 29% close within two years. Of those that do manage to survive, the long-term financial impact can be tremendous. For more information on this topic, please refer to our white paper, Extending MSCS Cluster High Availability Offsite.
Closing the gap in protection left vulnerable by traditional MSCS shared storage clusters
By definition, MSCS shared storage clusters are local solutions due to the shared storage aspect. Since all nodes in the cluster use the same storage device they must all be located in the same location. It's that aspect which causes such clusters to be unprotected from many types of failures. Examples of such failures might include network failures, storage device failure, power outages, server room overheating, fire or flooding and even natural disasters.
This inherent gap in protection can be overcome by utilizing real time data replication in addition to, or in some cases instead of, shared storage. Real-time data replication allows two geographically distant storage devices or servers to hold a mirror copy of the data at all times. This ability enables additional offsite nodes which utilize replicated data rather than data accessed directly via shared storage. Since the data replication is real-time, data loss is reduced to zero or at worst only a few seconds, even under the most extreme circumstances.
Microsoft alone does not provide a complete end-to-end geographically dispersed cluster solution. Microsoft does however work with hardware and software vendors to provide a complete solution. This guide is intended to present solutions which will help close the gap in protection left open by traditional MSCS shared storage clusters.
Our white paper, Extending MSCS Cluster High Availability Offsite, discusses the three most common technologies used to allow offsite failover for MSCS clusters:
- Cluster aware high availability software solutions
- MSCS clusters with emulated shared storage
- MSCS majority node set clusters
Cluster-aware high availability: host-based replication
Some solutions extend the actual clusters offsite by adding more nodes. These solutions are well suited for specific situations but are too complex and costly for the majority of organizations to justify. This is especially true for organizations already using shared storage clustering.
As you might expect, software solutions can provide greater flexibility at a more realistic price. Rather than actually adding more nodes in the cluster, our software allows failover to a separate offsite shared storage cluster or even stand alone machines. The replication is handled from each active node to each backup node or server; this is referred to as cluster aware host-based replication. When using our solution, the offsite machines can be completely different from the local machines so long as they are powerful enough to host the applications in question.

Example Cluster TopologyFor example you may failover to offsite shared storage clusters, standalone servers or even virtual servers utilizing VMWare or Microsoft Virtual Server. The complexity of the application failover is handled by our cluster aware software and can even be fully automatic should all available local cluster nodes fail. The level of automation is generally configuration with several levels of automation, manual failover, push button failover or automatically triggered failover.
Our solution utilizes asynchronous replication because asynchronous replication does not impose any distance limitations nor does it affect application performance. With asynchronous replication disk I/O is not delayed, rather the changes are simply monitored and sent to the target as fast as bandwidth allows. Our asynchronous replication gracefully handles minor network issues or even temporary disconnections without impacting application performance.
Testing the recoverability of your cluster/replica environment
To ensure its integrity and effectiveness in the face of constant changes, it's not just sensible, but critical to conduct regular disaster recovery tests. In fact, no disaster recovery solution is complete without continuous disaster recovery testing.
CA XOsoft™ Assured Recovery is a breakthrough technology that makes your disaster recovery investment truly complete. CA XOsoft Assured Recovery allows you to easily conduct a comprehensive yet transparent disaster recovery test of the recoverability of your replicated data. It does this by actually starting up application services on a replica system and performing whatever operations are required, including database updates, to verify the integrity of your data.
Quick Comparison
Feature |
CA XOsoft solution* |
Hardware solutions* |
Majority node set |
Able to support automatic or manual failover |
X |
X |
|
Does not affect existing local cluster failover behavior |
X |
X |
|
No distance limitation |
X |
|
X** |
Does not require cluster reconfiguration to implement |
X |
|
|
Allows failover to different server hardware or even virtual servers |
X |
|
|
Able to failover across subnets |
X |
|
|
Minimal IT staff training required |
X |
|
|
Integrated true CDP technology to protect from data corruption |
X |
|
|
Automated offsite server testing/validation |
X |
|
|
Able to support supplemental offsite backups/snapshots *** |
X |
X |
X |
* Varies depending on the vendor. Each vendor solution should be individually researched to ensure they actually offer such features.
** Latency must be under 500ms at all times.
*** Depends on replication and/or hardware device functionality.