We have the same problem and we are running VMware.
http://www.starwindsoftware.com/forums/ ... t2319.html
Due by this total data inaccessible issue when the HA nodes are being resynced for whatever reason, we have to take the Starwind HA out of our production environment until Starwind realize and fix this deadly problem.
This issue should be pretty easy to reduplicate in their Lab. Just setup two nodes in HA and one ESX or XenServer. After the HA is setup and get a few VMs running on the HA target, pull the power off all there boxes and power them back up. The issue should be appeared when the HA needs resync.
The issue is the initiator can connect and see the LUN in the HA target that is being synced but you can't read or write to it until the full sync is done. If you have couple hundred Gig data, you just wait a few hours. If you have over 10 TB data such as a data center, you are dead in the water in this scenario.
The other thing we discovered that even more deadly than the wait time to restore data access. If your HA nodes loss power all together at the same time, you probably don't care which node has the most up to date data. In our last total power failure, the HA nodes did not go down together because one of the node is on a bigger UPS unit. That node ran the other 10 more minutes and we did not know that until the remaining UPS ran out of juice and cut the power on the remaining node. An hour later when the power restored, we powered back up everything and find out the VMs won't start because of the iSCSI targets are inaccessible. We pretty much did the same thing like you did - Resynced the HA images. The software did not told us that the second node has more recent data. We end up synced the image from the first node (the node lose power 1st) into the second node (the node lose power 10 minutes after). 10 minutes data loss on a 50 VMs with Exchange servers and SQL servers is a disaster. I just wish the software should alert us that the last data read/write on the image file on the first node was 21:23pm and the data read/write on the image file on the second node was 21:35pm. Are you sure want to syn the 2nd node image from the 1st image???? Maybe the software just simply tell me the data in second node has 132 thousand transactions more than the first node. Either way should prevent me to make such mistake. We figured out this error by look up the UPS event log a day later. Pretty sadly.
The StarWind software needs more reliable way to handle the failure recovery in HA setup. We just cross our fingers to see when this will happen.
