Issues Recovering failed vSAN Free Node

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Post Reply
basetron
Posts: 5
Joined: Thu Feb 20, 2025 4:56 pm

Thu Feb 20, 2025 6:05 pm

We have a 2 node setup on two servers running Windows Server 2022 (VHOST3 and VHOST4). The virtual disks run in RAID through Dell H730 Mini cards. The two servers are identical.

On Monday, we had installed iDRAC and Windows updates on VHOST3, and were in the process of doing so on VHOST4. Prior to working on VHOST4, we live-migrated the VMs on VHOST4 to VHOST3. We ensured that they were running successfully, and proceeded with updating. After one of the reboots of VHOST4 in its update cycle, the boot drive failed. Through troubleshooting, we confirmed that it was the drive. No big deal, we thought. We have the VMs running on VHOST3, and individual backups of them on our NAS. On Tuesday morning, we arrived at the office to all of the VMs that had been moved to VHOST3 in a "Saved-Critical" state in the Hyper-V Manager. The Clustered Disk was offline. StarWind Management Console was giving (and continues to give) "Partner Node is not ready, see Replication Manager for details" warnings.

Since then, the "Saved-Critical" VMs have disappeared from Hyper-V manager on VHOST3. Additionally, VHOST3 cannot see the virtual disk in Server Manager. If you navigate to the directory, the disc image file is there with the .swdsk files. This is surely related to the loss of VMs, but I'm not sure how. Admittedly, we should have done more investigation into VHOST3 being down earlier in the week.

Following advice seen on the forum, our steps taken to rebuild the node on VHOST4 have been:

1. Install WS2022 on new boot drive for VHOST4 (done)
2. Configure server as per StarWind docs (issues)
3. Run RemoveHAPartner script on VHOST3 (unsuccessful)

For step 2, VHOST4 can only see the VHOST3 target in iSCSI Initiator. It does not see itself on localhost. Should it be able to? Or does that only happen once the partner/target is created by StarWind?

For step 3, I have been unable to remove the old VHOST4 partner info from VHOST3. The RemoveHAPartner.ps1 script fails with a

Code: Select all

$deviceName cannot be retrieved
error. I tried removing the

Code: Select all

<acltable>
entry for VHOST4 in StarWind.cfg, but it is added back when the StarWind service starts. As best I can understand it, this step is necessary before I can re-add VHOST4 with the partner add script.

I am sorry for the lengthy post, but I wanted to include as much information as I could. Even then, I'm sure I left out needed details. Thanks in advance for any advice.
yaroslav (staff)
Staff
Posts: 3597
Joined: Mon Nov 18, 2019 11:11 am

Thu Feb 20, 2025 6:59 pm

Is StarWind VSAN installed as CVM or a windows-native service?
Do you see the targets in the iSCSI initiator from the live node?
p.s. The fastet way for us to Please reach out to StarWind Support (https://www.starwindsoftware.com/support-form) to get the quotes for support.
p.p.s. Share the logs from both nodes collected with StarWind Log collector https://knowledgebase.starwindsoftware. ... collector/.
basetron
Posts: 5
Joined: Thu Feb 20, 2025 4:56 pm

Thu Feb 20, 2025 7:39 pm

yaroslav (staff) wrote:
Thu Feb 20, 2025 6:59 pm
Is StarWind VSAN installed as CVM or a windows-native service?
Do you see the targets in the iSCSI initiator from the live node?
p.s. The fastet way for us to Please reach out to StarWind Support (https://www.starwindsoftware.com/support-form) to get the quotes for support.
p.p.s. Share the logs from both nodes collected with StarWind Log collector https://knowledgebase.starwindsoftware. ... collector/.
StarWind vSAN is installed as a Windows-native service.

Only one target is listed in iSCSI Initiator from VHOST3, which is itself. It has a status of "inactive".

Logs: https://drive.google.com/file/d/1d3Ddrt ... drive_link
yaroslav (staff)
Staff
Posts: 3597
Joined: Mon Nov 18, 2019 11:11 am

Thu Feb 20, 2025 7:44 pm

Is VHOST3 synchronized in StarWind Management Console? If so, connect it and bring up the VMs. there might be a misconfiguration for iSCSI Initiator.
basetron
Posts: 5
Joined: Thu Feb 20, 2025 4:56 pm

Thu Feb 20, 2025 7:48 pm

yaroslav (staff) wrote:
Thu Feb 20, 2025 7:44 pm
Is VHOST3 synchronized in StarWind Management Console? If so, connect it and bring up the VMs. there might be a misconfiguration for iSCSI Initiator.
It is not synchronized. It is the most up-to-date version of the cluster, but I wasn't sure how to mark it as synced without access to the Replication Manager.
yaroslav (staff)
Staff
Posts: 3597
Joined: Mon Nov 18, 2019 11:11 am

Thu Feb 20, 2025 7:52 pm

You've removed another replication partner (VHOST4) already, from what I understand. If so, mark as synchornized.
Otherwise, please, stop StarWindService on VHOST4 and mark VHOST3 as sycnhronized.
basetron
Posts: 5
Joined: Thu Feb 20, 2025 4:56 pm

Thu Feb 20, 2025 7:54 pm

yaroslav (staff) wrote:
Thu Feb 20, 2025 7:52 pm
You've removed another replication partner (VHOST4) already, from what I understand. If so, mark as synchornized.
Otherwise, please, stop StarWindService on VHOST4 and mark VHOST3 as sycnhronized.
How can I go about marking VHOST3 as synchronized with the free version? I don't believe I have access to Replication Manager.

Edit: VHOST4 has not been properly removed. The node itself is down, but VHOST3 is still searching for it based on the logs I have seen.
yaroslav (staff)
Staff
Posts: 3597
Joined: Mon Nov 18, 2019 11:11 am

Thu Feb 20, 2025 8:03 pm

UI is locked for the Free version. To unlock UI and get a support contract (we will review your configuration too), log the call with us https://www.starwindsoftware.com/support-form.

You can mark the device as synchronized with the script.
1. Open SyncHADevice.ps1,
2. comment $device.Synchronize([SwHaSyncType]::SW_HA_SYNC_FULL, "")
3. uncomment #$device.MarkAsSynchronized()
basetron
Posts: 5
Joined: Thu Feb 20, 2025 4:56 pm

Thu Feb 20, 2025 8:18 pm

yaroslav (staff) wrote:
Thu Feb 20, 2025 8:03 pm
UI is locked for the Free version. To unlock UI and get a support contract (we will review your configuration too), log the call with us https://www.starwindsoftware.com/support-form.

You can mark the device as synchronized with the script.
1. Open SyncHADevice.ps1,
2. comment $device.Synchronize([SwHaSyncType]::SW_HA_SYNC_FULL, "")
3. uncomment #$device.MarkAsSynchronized()
I am not sure how I missed this script. Oops! VHOST3 is now synchronized, and the localhost connection for iSCSI was restored.

Should I now proceed with running the remove partner script? Or can I run the add partner script to re-add VHOST4?
yaroslav (staff)
Staff
Posts: 3597
Joined: Mon Nov 18, 2019 11:11 am

Thu Feb 20, 2025 9:29 pm

Hi,

Once VHOST4 starts up, and provided that StarWind VSAN is still running there, synchronization starts automatically.
lznzdam
Posts: 1
Joined: Tue May 13, 2025 3:31 am
Contact:

Tue May 13, 2025 3:33 am

Have you tried using the StarWind Management Console to recreate the replication connection from VHOST3 to the new VHOST4? I wonder if when you reconfigured VHOST4, you kept the same device name and storage path as before, as this may affect the ability to resynchronize.
yaroslav (staff)
Staff
Posts: 3597
Joined: Mon Nov 18, 2019 11:11 am

Tue May 13, 2025 7:23 am

Good point, but, sadly, the Free version does not allow for the console.
Post Reply