HA Cluster - Clean Reboot of one node caused disconect

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Post Reply
jtmroczek
Posts: 20
Joined: Thu Aug 22, 2013 10:11 pm

Fri Dec 20, 2013 12:37 am

Hello:

While performing maintenance on our HA Cluster, we rebooted one of the 2 nodes. Historically this has been handed gracefully. This time BOTH hosts sent logout messages to the initiators and it was 6 minutes! until the initiators could log back in again. This affected some (possibly all) HA LUNs in the cluster.

Where do I start looking for cause and how to prevent in the future?

Additional info:
The change necessitating the reboot was an upgrade to the driver for the RAID controller.
Starwind Host OS: Windows 2008 R2 SP1
Initiator OS: Seen from both Windows 2008 R2 SP1 ad Windows 2003 R2 SP2.
Starwind Version: v6.0.0 (Build 20120927, [SwSAN], Win64)

Thank you for any assistance you can provide to avoid this in the future.

~joe
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Fri Dec 20, 2013 12:35 pm

Thank you for using StarWind.

I`ve seen such issues before and as far as I understand the situation you need to update your SAN software to the latest build.
Also I`d like to ask you to ensure that you haven`t got any hardware errors on the StarWind box by reviewing the WinApp and WinSys logs.
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
jtmroczek
Posts: 20
Joined: Thu Aug 22, 2013 10:11 pm

Fri Dec 20, 2013 8:24 pm

Wow! I missed that we were on such an old version. The installers must have gotten mixed up. I think it is a testament to the quality of StarWind that the original 6.0.0 release has worked so well.

I have confirmed that no errors were reported in the Windows or IPMI event logs.

~joe
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Tue Dec 24, 2013 2:12 pm

OK. I think the best way here will be to update the SAN software and see if it helps.
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
jtmroczek
Posts: 20
Joined: Thu Aug 22, 2013 10:11 pm

Tue Dec 24, 2013 7:05 pm

Anatoly:

We have already performed the update. It is hard to know if the issue is resolved. Under the old code we rebooted over a dozen times without issue. I feel comfortable that there is nothing more to be done at this time.

~joe
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Mon Dec 30, 2013 10:46 am

Great to know!

Let us know if you`ll have any updates!
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
Post Reply