That's my theory anyway, when I cancelled it everything came back.
I have two Nodes hosting several Roles. I have seen (more than once until I learned how to deal with driver updates better) the Roles go to FAILED because they've lost their storage. I resolve the problem with connecting to Starwind and Turn Off the Role, then Start it and everything is fine. Luckily I'm not in a position to lose data when it happens.
This morning I did a RAID consistency check on the drive holding storage for several Roles and they went to FAILED because they had lost their storage. Since it was the only thing I'd done I cancelled the check and everything came back. I'll try it again at the weekend when I can stop Starwind and fail the Roles over like I do when I update drivers and firmware.
This is my first live Cluster, my first Starwind. I'm learning as I go so I'm quite ready to accept that I'm missing the point somewhere. If a Role owned by Node 2 has storage on CSV2 and there's a problem with one copy of CSV2, why is it not simply switching to the other copy and letting me know? What am I doing wrong?
The issue seems to have been with Starwind, I've spent most of today going through my iSCSI/MPIO configuration and it all appears to be exactly as described in the documentation, as if it should work how it's supposed to. The iSCSI settings for one Node, for example:
Disk0 CLS01-CSV2
Starwind HAImage2 replicates over 172.16.110.11/172.16.111.11 & 172.16.110.22/172.16.111.22
Name From To
CSV2 local Default 127.0.0.1
CSV2 partner cable 0 172.16.210.11 172.16.210.22
CSV2 partner cable 1 172.16.211.11 172.16.211.22
iqn.2008-08.com.starwindsoftware:cls01-csv2
Disk 2 Port 1: Bus 0: Target 1: LUN 0
MPIO (Fail Over Only)
• 0x770100001 Active ffffe00031167010-4000013700000002*
• 0x770100005 Standby ffffe00031167010-4000013700000006
• 0x770100006 Standby ffffe00031167010-4000013700000007
iqn.2008-08.com.starwindsoftware:cls02-csv2
Disk 2 Port 1: Bus 0: Target 5: LUN 0
MPIO (Fail Over Only)
• 0x770100001 Active ffffe00031167010-4000013700000002
• 0x770100005 Standby ffffe00031167010-4000013700000006*
• 0x770100006 Standby ffffe00031167010-4000013700000007
Disk 2 Port 1: Bus 0: Target 6: LUN 0
MPIO (Fail Over Only)
• 0x770100001 Active ffffe00031167010-4000013700000002
• 0x770100005 Standby ffffe00031167010-4000013700000006
• 0x770100006 Standby ffffe00031167010-4000013700000007*
Disk1 CLS01-CSV1
Starwind HAImage1 replicates over 172.16.110.11/172.16.111.11 & 172.16.110.22/172.16.111.22
Name From To
CSV1 local Default 127.0.0.1
CSV1 partner cable 0 172.16.210.11 172.16.210.22
CSV1 partner cable 1 172.16.211.11 172.16.211.22
iqn.2008-08.com.starwindsoftware:cls01-csv1
Disk 3 Port 1: Bus 0: Target 0: LUN 0
MPIO (Fail Over Only)
• 0x770100000 Active ffffe00031167010-4000013700000001*
• 0x770100003 Standby ffffe00031167010-4000013700000004
• 0x770100004 Standby ffffe00031167010-4000013700000005
iqn.2008-08.com.starwindsoftware:cls02-csv1
Disk 3 Port 1: Bus 0: Target 3: LUN 0
MPIO (Fail Over Only)
• 0x770100000 Active ffffe00031167010-4000013700000001
• 0x770100003 Standby ffffe00031167010-4000013700000004*
• 0x770100004 Standby ffffe00031167010-4000013700000005
Disk 3 Port 1: Bus 0: Target 4: LUN 0
MPIO (Fail Over Only)
• 0x770100000 Active ffffe00031167010-4000013700000001
• 0x770100003 Standby ffffe00031167010-4000013700000004
• 0x770100004 Standby ffffe00031167010-4000013700000005*
Disk2
Starwind HAImage3 replicates over 172.16.110.11/172.16.111.11 & 172.16.110.22/172.16.111.22
Name From To
Witness local Default 127.0.0.1
iqn.2008-08.com.starwindsoftware:cls01-witness1
Disk 5 Port 1: Bus 0: Target 2: LUN 0
MPIO (Round Robin)
• 0x770100002 Active ffffe00031167010-4000013700000003*
The Latest Gartner® Magic Quadrant™Hyperconverged Infrastructure Software