We've set up a 2 node test cluster at work. We have StarWind set up with shared quorum and generic imagefiles (as per your cluster document)
I was just testing failover. I went through a couple test failure configurations.
I tested failure in 3 different ways:
1) Graceful shutdown of the active node: The passive node took over and became the active node - WORKS
2)Selected a Resource, set the threshold to 0, Initiated failure on it: The passive node took over that resource - WORKS
3)Hard shutdown the Active node: The Passive node is unable to Take over - FAILURE
I'm getting these sort of errors in my event log:
Event Type: Error
Event Source: ClusDisk
Event Category: None
Event ID: 1209
Date: 6/3/2008
Time: 1:51:00 PM
User: N/A
Computer: NESGTEST299
Description:
Cluster service is requesting a bus reset for device \Device\ClusDisk0.
Data:
0000: 0e 00 00 00 01 00 5a 00 ......Z.
0008: 00 00 00 00 b9 04 00 00 ....¹...
0010: 41 52 73 74 00 00 00 00 ARst....
0018: 00 00 00 00 00 00 00 00 ........
0020: 00 00 00 00 00 00 00 00 ........
Event Type: Error
Event Source: ClusSvc
Event Category: Startup/Shutdown
Event ID: 1073
Date: 6/3/2008
Time: 1:51:00 PM
User: N/A
Computer: NESGTEST299
Description:
Cluster service was halted to prevent an inconsistency within the server cluster. The error code was 5892.
It is like Starwind does not know that the primary node no longer needs to lock the resources (because it is offline). The Starwind log does not show anything to shed light on this. Can anyone point me in the right direction on what I am missing here? Please & thank you. -Scott
The Latest Gartner® Magic Quadrant™Hyperconverged Infrastructure Software