On Monday, we had installed iDRAC and Windows updates on VHOST3, and were in the process of doing so on VHOST4. Prior to working on VHOST4, we live-migrated the VMs on VHOST4 to VHOST3. We ensured that they were running successfully, and proceeded with updating. After one of the reboots of VHOST4 in its update cycle, the boot drive failed. Through troubleshooting, we confirmed that it was the drive. No big deal, we thought. We have the VMs running on VHOST3, and individual backups of them on our NAS. On Tuesday morning, we arrived at the office to all of the VMs that had been moved to VHOST3 in a "Saved-Critical" state in the Hyper-V Manager. The Clustered Disk was offline. StarWind Management Console was giving (and continues to give) "Partner Node is not ready, see Replication Manager for details" warnings.
Since then, the "Saved-Critical" VMs have disappeared from Hyper-V manager on VHOST3. Additionally, VHOST3 cannot see the virtual disk in Server Manager. If you navigate to the directory, the disc image file is there with the .swdsk files. This is surely related to the loss of VMs, but I'm not sure how. Admittedly, we should have done more investigation into VHOST3 being down earlier in the week.
Following advice seen on the forum, our steps taken to rebuild the node on VHOST4 have been:
1. Install WS2022 on new boot drive for VHOST4 (done)
2. Configure server as per StarWind docs (issues)
3. Run RemoveHAPartner script on VHOST3 (unsuccessful)
For step 2, VHOST4 can only see the VHOST3 target in iSCSI Initiator. It does not see itself on localhost. Should it be able to? Or does that only happen once the partner/target is created by StarWind?
For step 3, I have been unable to remove the old VHOST4 partner info from VHOST3. The RemoveHAPartner.ps1 script fails with a
Code: Select all
$deviceName cannot be retrieved
Code: Select all
<acltable>
I am sorry for the lengthy post, but I wanted to include as much information as I could. Even then, I'm sure I left out needed details. Thanks in advance for any advice.