High Availability: If both nodes were down and then full sync has been initiated, device becomes available for client connections immediately. Requests are processed by the node which is selected as synchronization source.
Acccording to this release note, when the HA target is in full sync stage, the data should be accessibe from the primary node. We carefully tested it on our test environment and found out this feature doestn't work.
Here is what we have done so far.
1. Created a new HA target.
2. Full sync the partner target.
3. While the partner is being full sync, we attempted to create a new data store on the HA target that is being synced and get error. The store is not able to create while the HA target is being full sync. We tried this many times on different ESX 4.1 servers and get the same result.
4. After the HA target is fully synced, we can successfully create the new data store just like normal.
5. After we created the new data store in ESX, we went back in the starwind GUI and forced the HA target to do a full sync again just to simulate both nodes went down and come back up with full sync stage.
6. After we forced the HA target to do a full sync, we can browse the content in the HA target data store in any ESC server. It seems like the HA function is working. However, when we attempted to vMotion a 4GB virtual machine into the HA store, the process will just like hung and eventually error out with time out error.
7. After the HA target is fully synced again, we can successfully vMotion a virtual machine in the data store.
Base on our test result, the HA target is not going to full operational when the target is in syncing stage.
This is going to be a deadly thread in production environment. Just imagine if you have production data on HA target. One node went down becasue of multiple disk failure. When you rebuilt the failed node and add it back online and perfrom a full sync with the active node, the data on the active node becomes inaccessible. You have to wait until the target is fully insync again.
I think we can accept the fact the data accessing is slower in full sync stage but data inaccessible is not acceptable in the HA setup.
Starwind software engineer needs to fix this limitation ASAP in order to make the HA feature really High Availability.
