Hyper-V servers failover problems

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Post Reply
elaw
Posts: 14
Joined: Thu Jul 18, 2013 1:59 pm
Location: Bedford, MA

Thu Nov 13, 2014 3:51 pm

Okay, this isn't so much a tech support request as much as just asking the community whether my expectations are realistic... and it's gonna be long!

Our setup is two clustered Hyper-V servers, running Windows 2012 and Starwind native SAN for Hyper-V version 6.0.6399. There are two HA devices on each node - a large (~1.2TB) one for data and a small (1GB one for quorum). MS ISCSI initiator is used on both nodes to connect to both nodes, and cluster shared storage is set up in the usual way (as far as I know... I'm pretty new to clustering). Each server has one 1GB dedicated NIC connected to a dedicated NIC on the other server for Starwind synchronization, and another for Starwind heartbeat. In addition there's a management NIC connected to our LAN, and 4 other NIC ports that the VMs use to communicate to the LAN.

When we try to do a failover "cleanly", through the management tools, everything works fine. The VM or VMs stops on one server, starts up on the other one, birds chirp, the sun shines, and all is well.

But lately one server has been having hardware problems and when that happens failover does not occur. The other server is still running, but when you run failover cluster manager it pretty much acts as if the cluster doesn't exist. If you try to "Connect to cluster" the cluster shows in the selection list but when you actually try to connect it says the cluster service is not running... but it is in fact running (on that server).

In this condition, Starwind *seems* to be working... I didn't check ultra-thoroughly last time this happened as I was hurrying to get the system back up, but I could drill down from C: into "cluster shared storage" and everything seemed present and accessible.

Now here's the best part: my coworker, who's as green with clustering as I am, has convinced himself this is normal... if one server's hardware fails or connectivity between the two fails, he thinks the cluster is just supposed to stop.

I think that's crazy... otherwise what's the point of having a cluster.

So who's right?
User avatar
anton (staff)
Site Admin
Posts: 4021
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Thu Nov 13, 2014 4:41 pm

Both. You can configure cluster to treat lack of a partner and heartbeat as "OK, I'm the one left alone so I'll continue working as show must go on" and you can configure it "OK, I'm alone and I don't have quorum so I have to die".
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
elaw
Posts: 14
Joined: Thu Jul 18, 2013 1:59 pm
Location: Bedford, MA

Thu Nov 13, 2014 5:31 pm

Interesting... do you by any chance have any links to info on that?

I've actually done quite a bit of searching but so far haven't really found anything definitive.
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Tue Nov 18, 2014 10:14 am

Hi!

Try to configure cluster as following:
Connect the Witness disk through the loopback only. So each node would be connected to its Witness HA partner through the 127.0.0.1, not connecting to other servers.

Let us know if that will work please.
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
Post Reply