Crazy HA RAM disk idea

Thu Feb 21, 2013 5:12 pm

1) Neither vMotion, nor Live Migration nor XenMotion etc work the way you describe. No hypervisor moves VM disk content rather it's only VM RAM footprint being transferred.

2) I don't buy ancient active-passive SANs. Especially based on generic Windows clusters. "We're not in Kansas any more" (c) ... I mean we're not in 2001.

3) StarWind with 3-way replica allows 2 nodes from 3 being AWOL and cluster still being up and running. So "OK" to double faults.

4) Interesting multi-site cluster design.

Still not sure what you want StarWind to do in a referenced interconnect diagram.

Timothy Mitchell wrote:
anton (staff) wrote:There are many ways to skin a cat (c) ...

You can handle everything on hypervisor level (diff clones) or you may use deduplicated storage where SAN will take care of mapping the same data to the same blocks
and caching them deduplicated. Result would be pretty much the same.
For any normal filing system this would be true. However, when you consider the impact that live migration or v-motion has on any storage system, local or network, you would always want to use differencing disks.

Why?

During any form of live migration all disks directly associated with a VM must be transferred between hosts. If the hyper visor is not aware of the deduplication method when it transfers a VM between hosts it will copy the full contents of the drive and not just the unique section.

For instance, two systems running a web server, the first uses a single virtual disk for all operating system data and the second uses a differencing disk for the same information. On the first the total VHD disk size is 30G, on the second the parent VHD disk is 25G and the differencing disk is 5G. During a live migration of the first web server the entire 30G VHD must be copied between servers for the migration to occur. For the second web server to live migrate only the differencing disk is transferred between hosts, provided the parent disk is available at the same local or network path. If the parent disk is not at the same path the system will prompt you to include the parent disk in the migration.

The argument can be made that a good deduplication SAN can detect duplicate information being read by one server and written by another and allow the copy process to happen faster. However in my experience this has placed a high and unnecessary load on your network for transferring the information between systems and on the memory and processors of your SAN for detecting the duplicate information.

Pushing this separation architecture to its most efficient, use a differencing disk for your operating system drive and have the OS in the virtual machine directly connect an additional drive letter to a separate ISCSI or SMB shared disk(s) for any installed applications, stored files, and log files (Except ISCSI and SMB logging). This will reduce the size of your differencing disk to typically less than half a Gig and allow live migrations of typically large systems such as SQL or Exchange to occur in a matter of seconds on a standard gigabit network.

anton (staff) wrote:I still don't buy your approach with non-redundant SANs and cannot figure out how failover happens.
As for the non-redundant SANs, we are running redundancy on our SANs and our environment requires us to run N+2 no single point of failure on all systems. This means that we must be able to take two failures or configuration errors within any service layer and continue to operate.

Each individual SAN is deployed as a pair running failover clustering. The clustering service is only for ISCSI and SMB VHD access. The active device within the cluster handles VHD access and the passive device runs the volume shadow copy service for creating backups without impacting production performance. The passive device is also configured as preferred for DFS traffic and offsite archive.

Each of our sites contains 3 pairs of SAN units. Each member within a cluster runs on a separate SAN pair. SQL is replicated using transactional synchronization. Exchange databases are replicated using DAG. DNS and Active Directory use their own replication system. Web front ends are configured with secondary SQL servers. UAG load balancers use NLB based VIPs for accessible IPs and are configured to detect web server errors in case multiple SQL failures leave a web front end orphaned from both of its SQL servers. And finally DFS is run directly on the SANs for file replication and availability.

Site to site replication is handled through SQL transaction logging, Exchange DAGs, and DFS. Virtualized GTM load balancers on ANYCAST IP addresses redirect CNAMEs to the nearest available site. This design allows our sites to run independent of each other in the case of complete network outage or area isolation. Any split brain clustering issues are handled through the transactional synchronization system when the sites are able to communicate with each other again.

Timothy Mitchell · Fri Feb 22, 2013 12:02 pm

The point of the posting was to let Aitor_Ibarra (the thread starter) know that the idea was not trash and that there is a production use for a completely volatile RAMsan. I am requesting no feature be added and have replied to clarify your questions. The implementation of the RAMsan software is running very well.

anton (staff) wrote:1) Neither vMotion, nor Live Migration nor XenMotion etc work the way you describe. No hypervisor moves VM disk content rather it's only VM RAM footprint being transferred.

A live migration uses memory only if it is a migration on the same cluster and if the cluster is using a shared SAN resource for all VHD disk storage. That’s a lot of ifs needed to get a system working and our environment is larger than three times the maximum cluster size for any existing hypervisor. For ease of operation and dependence separation we chose to stick with standalone hyper-v boxes.

Here is a well written article describing this. See the section at the end titled “how this works” for a diagram.
http://workinghardinit.wordpress.com/ta ... migration/

It was not a truly live migration at the time but the transactional clustering system we have in place inside the guest VM allowed us to migrate VMs in a similar fashion on server 2008 using a brief shutdown to move the differencing disk. A single cluster member would be shutdown long enough to move the differencing disk and configuration to a different host. Using Server 2012 and the built in temporary SMB shares for live migration of storage, no shutdown is needed any more, making this a true live migration.

anton (staff) wrote:2) I don't buy ancient active-passive SANs. Especially based on generic Windows clusters. "We're not in Kansas any more" (c) ... I mean we're not in 2001.

The passive side of the SAN was implemented to provide a faster system for transactional replication data between sites and increase the response time for specific services by providing dedicated hardware. In essence we have 3 independent SAN targets and 3 dedicated servers which share some storage with the VM SAN for specific tasks that require higher throughput performance then the rest of the VM SAN.

The passive side of our SAN pairs is not a requirement for this system to operate stable and safe. With transactional replication keeping the local clusters alive, dumping a SAN only disrupts the section of the clusters that is attached to that SAN. In fact, you could dump every SAN in the environment except 1 and it will remain alive.

anton (staff) wrote:3) StarWind with 3-way replica allows 2 nodes from 3 being AWOL and cluster still being up and running. So "OK" to double faults.

This is good to know, we still have to implement at least three independent configurations within our environment but how many SAN systems could you get in a single cluster?

Fri Feb 22, 2013 9:03 pm

1) We'll implement RAM-based HA with one of the next updates. It's not a crazy idea.

2) You're throwing everything into the same basket: Live Migration, Storage Live Migration, Shared Nothing Live Migration and so on.

Very different things implemented to solve very different tasks. Not expected to be used interchangeably.

3) SNLM indeed does not need a SAN but it does not require any SMB share either.

4) SNLM is not an Enterprise feature. It helps with only one thing - patching Windows cluster (something you need to do on frequent basis) w/o interrupting VMs.
It does not protect from unexpected downtime and it does not allow to build guest VM clusters either. It's a PITA to use with a huge amount of VMs as it takes
quite a time to transfer everything from one host to another. Very clumsy attempt to build a competitive technology to emulated shared storage (something we specialize on).

5) We're not Nutanix so we don't ask for the same (equal) number of Hyper-/ESXi and storage hosts. You can co-run as many StarWinds (native or VM-ed) within your
cluster as you want. It's still 2-way or 3-way replica between hosts so if you want to convert say all 6 hosts from 6 host Hyper-V cluster you'll end with a 3+3 config. We'll
change this with the upcoming (post V8) versions to allow building very different network interconnect topologies (it's a true Enterprise feature say ISPs and cloud providers
beg for).