I have a 2 node SOFS using Windows 2012r2 with Storage Spaces and StarWind 8.0.8730. We have had this in production for a few months and it has been great so far.
Over the weekend I ran into a problem though. I noticed node1 was in a fast synchronization loop. It gets to 100% Synchronized and then starts over. After digging around a bit and taking a look at some logs it looks like the issue is with the iSCSI target on node1. When looking at the connection in iSCSI Initiator from either node I see:
iqn.2008-08.com.starwindsoftware:node1.domain.local-storage – Inactive
When I hit the connect button on either node I get an error: Service Unavailable
For troubleshooting purposes, I tried to add a target using the Add a Target Wizard in the StarWind console. The wizard will just hang on “Creating Target” until I hit the cancel button.
Here are the relevant administrative events from both nodes that kicked off this issue:
Events on Node 1:
4:04:53 StarWindService 802 High Availability Device iqn.2008-08.com.starwindsoftware:node1.domain.local-storage, current Node passed to "Not synchronized" State, Reason is remote and local Write Requests have been completed with different Statuses
4:04:56 StarWindService 775 High Availability Device iqn.2008-08.com.starwindsoftware:node1.domain.local-storage, current Node State has changed to "Not synchronized"
4:04:56 iScsiPrt 10 Login request failed. The login response packet is given in the dump data.
4:04:57 StarWindService 774 High Availability Device iqn.2008-08.com.starwindsoftware:node1.domain.local-storage, current Node State has changed to "Synchronizing"
Events on Node 2:
4:04:53 Disk 153 The IO operation at logical block address 0x146e45b0 for Disk 8 (PDO name: \Device\MPIODisk1) was retried.
4:04:56 StarWindService 771 High Availability Device iqn.2008-08.com.starwindsoftware:node2-storage, Partner Node iqn.2008-08.com.starwindsoftware:node1.domain.local-storage State has changed to "Not synchronized"
4:04:56 iScsiPrt 10 Login request failed. The login response packet is given in the dump data.
4:04:57 StarWindService 770 High Availability Device iqn.2008-08.com.starwindsoftware:node2-storage, Partner Node iqn.2008-08.com.starwindsoftware:node1.domain.local-storage State has changed to "Synchronizing"
At this point I have node1 in a paused state and node2 is handling the production load just fine. I am not sure how to fix this issue though and I don’t want to do anything that might affect node2 in its production capacity. Any advice would be greatly appreciated.
Thanks!
The Latest Gartner® Magic Quadrant™Hyperconverged Infrastructure Software