Page 1 of 1

After the time change, sync channel dropped and iscsi connec

Posted: Mon Mar 14, 2011 6:14 pm
by dmansfield
Good Afternoon, moments after the time change on Sunday, our 10 Gigabit sync channel was dropped and our iSCSI connections lost. Errors compounded from there and both servers self-instructed a reboot. Because both servers rebooted at approximately the same time, our HA datastores were both thrown out-of-sync, and we were forced into an 8+ hour recovery. We have two StarWind HA Enterprise servers running primarily HA datastores. The server are Dell T710s running Windows Server 2008 R2. We have contacted Dell and they have scoured our log files and have found no hardware errors. Please assist us in determining the cause of our issues on Sunday so that we can prevent a recurrence. Thank you.

Re: After the time change, sync channel dropped and iscsi connec

Posted: Mon Mar 14, 2011 7:11 pm
by anton (staff)
1) There should be a reason for connection to drop. Either Windows Event Log or StarWind Log should contain data entries about this event. Can we have both ones checked please?

2) Who had instructed both servers to reboot? We don't do this so who was it?

After finding out why connection was lost and who had instructed machines to reboot we'd definitely find out who's guilty and what to do never seeing this issue again. Thanks!
dmansfield wrote:Good Afternoon, moments after the time change on Sunday, our 10 Gigabit sync channel was dropped and our iSCSI connections lost. Errors compounded from there and both servers self-instructed a reboot. Because both servers rebooted at approximately the same time, our HA datastores were both thrown out-of-sync, and we were forced into an 8+ hour recovery. We have two StarWind HA Enterprise servers running primarily HA datastores. The server are Dell T710s running Windows Server 2008 R2. We have contacted Dell and they have scoured our log files and have found no hardware errors. Please assist us in determining the cause of our issues on Sunday so that we can prevent a recurrence. Thank you.

Re: After the time change, sync channel dropped and iscsi connec

Posted: Thu Mar 17, 2011 7:04 pm
by dmansfield
Dell is indicating that the issue is most likely with TOE being enabled on the Intel 10Gb nics. Before doing what they recommend I want to make sure nothing will impact StarWind; especially the commands from the elevated command prompt. Dell's recommendations are to do the following:

Step 1. Install the latest tcpip.sys for server 2008 R2 http://support.microsoft.com/kb/2386184

Step 2. Run the following commands from elevated command prompt:

netsh int tcp set global rss=disabled

netsh int tcp set global netdma=disabled

netsh int tcp set global chimney=disabled

netsh int tcp set global autotuninglevel=disabled

netsh int tcp set global congestionprovider=none

Step 3. Disable TOE - Intel Configuration
a. Open Device Manager
b. On each Intel NIC in Device Manager, disable the following:
c. NOTE: Not every option is available or exist on the Advanced Tab.
i. Offload Receive IP Checksum
ii. Offload Receive TCP Checksum
iii. Offload TCP Segmentation
iv. Offload Transmit IP Checksum
v. Offload Transmit TCP Checksum
vi. IPV4 Checksum Offload
vii. Large Send Offload v2 (IPV4)
viii. Large Send Offload v2 (IPV6)
ix. Receive-Side Scaling
x. TCP Checksum Offload (IPV4)
xi. TCP Checksum Offload (IPV6)
xii. UDP Checksum Offload (IPV4)
xiii. UDP Checksum Offload (IPV6)

Step 4. Install SP1 for Windows Server 2008 R2

Step 5. Have a StarWind engineer review the iscsi configuration. Dell commented that it is an unsupported configuration the way it is setup.

I have sent this information in to StarWind tech support but also wanted to get this out for other Forum viewers to see.

Thank you

Re: After the time change, sync channel dropped and iscsi connec

Posted: Thu Mar 17, 2011 7:21 pm
by anton (staff)
Good. One question so far.

"Dell commented that it is an unsupported configuration the way it is setup."

What does it mean exactly? Could you please clarify.

Thank you very much!

Anton

Re: After the time change, sync channel dropped and iscsi connec

Posted: Thu Mar 17, 2011 8:38 pm
by dmansfield
By "Good" are you saying the proposed changes from Dell are okay implement and won't conflict with StarWind? I am not sure what Dell is refering to about the iscsi settings. They did say that it may be fine but it is not a setup that they are used to seeing. Can we do a remote session with one of your engineers to take a quick look?

Re: After the time change, sync channel dropped and iscsi connec

Posted: Fri Mar 18, 2011 8:17 am
by anton (staff)
OK to check. Should be performance touched in any case and we'd see this pretty soon.
dmansfield wrote:By "Good" are you saying the proposed changes from Dell are okay implement and won't conflict with StarWind? I am not sure what Dell is refering to about the iscsi settings. They did say that it may be fine but it is not a setup that they are used to seeing. Can we do a remote session with one of your engineers to take a quick look?

Re: After the time change, sync channel dropped and iscsi connec

Posted: Wed Jul 06, 2011 2:32 pm
by dmansfield
Update for anyone running Intel 10gb nics for the sync channel. We were running two direct connections in a team type of "Adapter Fault Tolerance" with one nic active and the other nic in standby. This teaming is what was causing the servers to crash. We took away the team and are running with just one nic on each side enabled and everything works now.

Re: After the time change, sync channel dropped and iscsi connec

Posted: Wed Jul 06, 2011 9:35 pm
by anton (staff)
Thank you very much for your update!

P.S. With V5.8 it would not be necessary to team NICs any more. Just an opposite :))
dmansfield wrote:Update for anyone running Intel 10gb nics for the sync channel. We were running two direct connections in a team type of "Adapter Fault Tolerance" with one nic active and the other nic in standby. This teaming is what was causing the servers to crash. We took away the team and are running with just one nic on each side enabled and everything works now.

Re: After the time change, sync channel dropped and iscsi connec

Posted: Thu Jul 07, 2011 9:16 pm
by clayton@mcc911.org
you still never said why the servers rebooted??

Re: After the time change, sync channel dropped and iscsi connec

Posted: Fri Jul 08, 2011 6:27 am
by anton (staff)
StarWind never forces machines to reboot and it's user-land component so even if StarWind crashes it's not deadly to the whole system. That's why the answer is "I don't know".

It's definitely not StarWind but something else (software or hardware) so I'd suggest to start looking at system event log looking for "red light" messages from other guys.
clayton@mcc911.org wrote:you still never said why the servers rebooted??