Page 1 of 1

CentOS causing disk to lose synchronization - urgent

Posted: Sat Oct 29, 2011 1:12 am
by lvp1138
Hi guys,

Over the last two days, I noticed that shutting down a Centos 5.7 server, connected to 8 iscsi devices powered by Starwind (two Starwind servers in HA, each exporting four disks via iscsi), during disconnection of iscsi, Starwind suddenly loses synchronization on one of the disks.

It is always the same disk, and this is being triggered by a brand new server recently setup. Actually, by two of them.

Now however, synchronization has been lost to such a point, that Starwind is rejecting iscsi connections on BOTH servers.

I see a lot of errors on the logs like this:



10/28 17:29:57.119 940 HA: CMSInitiatorDevice::SendCDB2Device: EXITing with failure, DeviceIoControl( IOCTL_SCSI_PASS_THROUGH_DIRECT ) failed, ERROR = 55, SCSI STAUTUS = 0!
10/28 17:29:57.119 940 HA: CMSInitiatorDevice::SendCustomControlScsiCommand: EXITing with failure, SendCDB2Device(...) failed!

10/28 17:30:09.271 564 PR: Set Unit attention 0x29/0x3 for session 0xe33 fr4om iqn.1994-05.com.redhat:8d501fc5f049,00023D010000.

10/28 17:45:21.358 920 HA: CSynchBarrier::EnterSynchBarrier: EXITing with failure, Max threads count(128) reached!
10/28 17:45:21.436 934 HA: CSynchBarrier::LeaveSynchBarrier: WARNING: Barrier block with p_ulptrBlockID 0x0000000476AFDCF0 is not found!

Theories? I have 100 virtual machines offline due to this at this moment.

Starwind version 5.7.1733.

Re: CentOS causing disk to lose synchronization - urgent

Posted: Sat Oct 29, 2011 2:58 am
by lvp1138
Well, now another disk lost synchronization...

Re: CentOS causing disk to lose synchronization - urgent

Posted: Sat Oct 29, 2011 4:11 pm
by anton (staff)
Please grab complete log, zip it and send to support@starwindsoftware.com for analysis. Some of support staff should contact you in a couple of hours.

Re: CentOS causing disk to lose synchronization - urgent

Posted: Sat Oct 29, 2011 4:24 pm
by Max (staff)
delete the HA targets from the StarWind Management console, mount HA images as basic targets, choosing to use “Custom header” and set it to 1024 (for StarWind 5.4 and earlier) or 65536 (for StarWind 5.5 or later). Check which node has the most recent data, remove the target and recreate the HA device running the synchronization in appropriate direction.

Now, there is a set of special settings for linux clients

Using the notepad edit the starwind.cfg file (located in the StarWind installation folder)
Change the <!--<iScsiDiscoveryListInterfaces value="1"/>--> to
<iScsiDiscoveryListInterfaces value="1"/>
Save changes and exit the notepad.
changes should be already there

StarWind Servers Registry (HKEY_LOCAL_MACHINE/System/ControlSet001/Control/Class/{4D36E97B-E325-11CE-BFC1-08002BE10318}/0000/Parameters):
MaxBurstLength - 0x00040000(262144)
MaxRecvDataSegmentLength 0x00040000(262144)
MaxTransferLength 0x0080000 (524288)

->Restart the server
Highly recommend to doublecheck these values
CentOS servers which will be using HA:
Enable multipathing

edit etc/iscsi/iscsid.conf -> set the
maxburstlength - 262144
maxreceivedatasegmentlength - 131072

edit etc/multipath.conf - add these lines into the uncommented defaults section:
user_friendly_names no
polling_interval 10
path_grouping_policy group_by_prio /multibus

After the files have been edited-> service open-iscsi restart
In order to connect the targets:
1. iscsiadm -m discovery -t st -p SAN1IP:3260
2. iscsiadm -m discovery -t st -p SAN2IP:3260
3. iscsiadm -m node -T SAN1 target IQN -p SAN1IP:3260 -l
4. iscsiadm -m node -T SAN2 target IQN -p SAN2IP:3260 -l
after this go to the etc/iscsi/nodes/SAN#-IQN/SAN# IP/ -> in the default edit the login and startup sequences (automatic instead of manual), lower the tcp.window.size to 262144 and save the changes (this must be performed on all logon interfaces for both targets logged on)
When finished -> service open-iscsi restart

iscsiadm -m session to view active connections.

Please make sure that the transfer sizes and tcp window size are set on all CentOS servers.