ESXi4 Periodically Drop connection to StarWind

sunyucong · Mon Sep 12, 2011 11:04 am

Hi, I just installed Starwind 5.7 as SAN target to my 4 host esxi 4.1u1 cluster.

However, I've been experiencing frequent iscsi connection drop on the esxi side, as you can see in log, the target quickly reconnect themself, the device latency on esxi datastore monitoring is around ~500ms which is terrible, also the queue depth often drop from 128 to 80.

Successfully restored access to volume 4e6dafd4-
109d8400-e082-842b2b4dfe7e (VMSTORE(S))
following connectivity issues.
info
9/12/2011 3:49:11 AM
10.0.0.20

Issue detected on 10.0.0.20: ScsiDeviceIO: 2365:
Failed write command to write-quiesced partition
eui.79207204787fc07c:1 (0:03:40:39.629
cpu3:5282)
warning
9/12/2011 3:49:09 AM
10.0.0.20

On starwind side I've applied all recommended tcp settings and hosts are all using mpio over 6 Gigabit port. I enabled write-through cache with 1000mb cache and 5000ms expiration time, I also raised max queue from 64 to 128, since I saw the drop.

please advise where I did wrong and how to prevent the periodical drop.

sunyucong · Tue Sep 13, 2011 6:34 am

Here's some relevant log from esxi side. Please help!! it's barely usable now.

Sep 13 06:24:43 10.0.0.20 vmkernel: 0:23:16:14.383 cpu1:4097)NMP: nmp_CompleteCommandForPath: Command 0x16 (0x41027f3dea40) to NMP device "eui.79207204787fc07c" failed on physical path "vmhba35:C0:T0:L0" H:0x2 D:0x0 P:0x0 Possible sense data: 0x3 0x0 0x0.
Sep 13 06:24:43 10.0.0.20 vmkernel: 0:23:16:14.383 cpu1:4097)NMP: nmp_PathDetermineFailure: SCSI cmd RESERVE failed on path vmhba35:C0:T0:L0, reservation state on device eui.79207204787fc07c is unknown.
Sep 13 06:24:43 10.0.0.20 vmkernel: 0:23:16:14.383 cpu1:4097)ScsiDeviceIO: 1685: Command 0x16 to device "eui.79207204787fc07c" failed H:0x2 D:0x0 P:0x0 Possible sense data: 0x3 0x0 0x0.
Sep 13 06:24:43 10.0.0.20 vmkernel: 0:23:16:14.435 cpu1:4097)NMP: nmp_CompleteCommandForPath: Command 0x16 (0x41027f3dea40) to NMP device "eui.79207204787fc07c" failed on physical path "vmhba35:C0:T0:L0" H:0x2 D:0x0 P:0x0 Possible sense data: 0x3 0x0 0x0.
Sep 13 06:24:43 10.0.0.20 vmkernel: 0:23:16:14.435 cpu1:4097)NMP: nmp_PathDetermineFailure: SCSI cmd RESERVE failed on path vmhba35:C0:T0:L0, reservation state on device eui.79207204787fc07c is unknown.
Sep 13 06:24:43 10.0.0.20 vmkernel: 0:23:16:14.435 cpu1:4097)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe: NMP device "eui.79207204787fc07c" state in doubt; requested fast path state update...
Sep 13 06:24:43 10.0.0.20 vmkernel: 0:23:16:14.435 cpu1:4097)ScsiDeviceIO: 1685: Command 0x16 to device "eui.79207204787fc07c" failed H:0x2 D:0x0 P:0x0 Possible sense data: 0x3 0x0 0x0.
Sep 13 06:24:43 10.0.0.20 vmkernel: 0:23:16:14.450 cpu5:4982)WARNING: iscsi_vmk: iscsivmk_StartConnection: vmhba35:CH:0 T:0 CN:0: iSCSI connection is being marked "ONLINE"
Sep 13 06:24:43 10.0.0.20 vmkernel: 0:23:16:14.450 cpu5:4982)WARNING: iscsi_vmk: iscsivmk_StartConnection: Sess [ISID: 00023d000001 TARGET: iqn.2009-10.com.outofwall:axis-vmstore TPGT: 1 TSIH: 0]
Sep 13 06:24:43 10.0.0.20 vmkernel: 0:23:16:14.450 cpu5:4982)WARNING: iscsi_vmk: iscsivmk_StartConnection: Conn [CID: 0 L: 10.0.0.20:54840 R: 10.0.2.5:3260]
Sep 13 06:24:43 10.0.0.20 vmkernel: 0:23:16:14.488 cpu5:4117)FS3: 398: Reclaimed heartbeat for volume 4e6dafd4-109d8400-e082-842b2b4dfe7e (VMSTORE(S)): [Timeout] [HB state abcdef02 offset 3264512 gen 127 stamp 83774487523 uuid 4e6db024-225c14bf-2264-a4badb3f8ae8 jrnl <FB 69176> drv 8.46]
Sep 13 06:24:43 10.0.0.20 vobd: Sep 13 06:24:43.547: 83775340822us: [esx.problem.vmfs.heartbeat.recovered] 4e6dafd4-109d8400-e082-842b2b4dfe7e VMSTORE(S).
Sep 13 06:24:43 10.0.0.20 vmkernel: 0:23:16:14.488 cpu6:4117)FS3: 8562: Long VMFS3 rsv time on 'VMSTORE(S)' (held for 2883 msecs). # R: 1, # W: 1 bytesXfer: 2 sectors
Sep 13 06:24:43 10.0.0.20 Hostd: [2011-09-13 06:24:43.548 20A9CB90 info 'ha-eventmgr'] Event 609 : Successfully restored access to volume 4e6dafd4-109d8400-e082-842b2b4dfe7e (VMSTORE(S)) following connectivity issues.

Constantin (staff) · Wed Sep 14, 2011 8:41 am

Looks like you have some issues with hardware: according to vmkernel log ESX(i) doesn`t get reply in required time period and than runs reconnection to SAN. I recommend you to check your configuration and than try to recreate the issue.
Few more things: it happens under high load or not?
Which version of vSphere do you use?
Which version of StarWind do you use? (You can check in starwind log)

sunyucong · Thu Sep 15, 2011 4:25 am

It was ESXi 5.0 + starwind 5.7.1733 the latest one. running on 2008R2 sp1. ESXi are using MPIO per 1 iops on 6 adapaters on t he server side.

At first, I created a shared iscsi target with image file on a raid10 NTFS volume, 1000mb write through caching in async mode, that turns out to be pretty unusable. Then I did a bunch of tcp tuning setting globalTcpWindowSize and TcpWindowSize, that has some improvement.

Then I went ahead change the target to synchronous mode, turned off SOIC in vcenter, turned of write through caching which further improved things, it has been more somewhat responsive, but small io is still pretty slow. The hardware is not really a problem since I just migrate from microsoft iscsi target on exact same hardware, which works perfectly.

Now I'm stumped, things are somewhat working but I can only push to ~60Mbps disk io and when that happens, every other io, i mean every other io are all stucked, it is as if starwind don't have io scheduling at all.

On the other hand, I tested another client reading from a SMB share on the server, can easily push up to over 120Mbps disk io and saturated one adapter.

sunyucong · Thu Sep 15, 2011 4:28 am

Can someone gives a suggestion whether I should turn on or turn off these settings on my adapater:

Checksum offload
Large Send Offload
Receve Side Scaling
TCP Connection offload

Thu Sep 15, 2011 6:47 am

All should be turned "ON".

sunyucong wrote:Can someone gives a suggestion whether I should turn on or turn off these settings on my adapater:

Checksum offload
Large Send Offload
Receve Side Scaling
TCP Connection offload

sunyucong · Fri Sep 16, 2011 5:05 am

My starwind is running on 2008R2 sp1

Do I still have to set these :

for each nic ?
TcpAckFrequency = 1

Global paramter
SackOpts = 1
Tcp1323Opts = 3
TcpWindowSize = 20971520
GlobalMaxTcpWindowSize = 20971520

>

sunyucong · Sat Sep 17, 2011 10:34 pm

Ok, I finally figured out, for some reason, MPIO is the reason the whole thing flaky for me, after I switched to LACP teaming, everything is solved!

Mon Sep 19, 2011 9:14 pm

Well... Of course it's up to you but from what I understand you've found a workaround rather then solution to your issue.

sunyucong wrote:Ok, I finally figured out, for some reason, MPIO is the reason the whole thing flaky for me, after I switched to LACP teaming, everything is solved!

sunyucong · Tue Sep 20, 2011 5:15 am

Teaming does work somewhat well enough, I would really want to understand why MPIO fails for me.

A interesting observation I made on the server, TCP SegmentRetrans is very very high when using MPIO mode, after observe that I think the problem is in network layer, I switched to teaming and Retrans is lowered to very minimum, no wonder the latency has decreased to 10~30ms.

And then, I once again create two port on esxi side, and doing multipather there, so each esxi is having two identical path to 1 4port bounded team on the server side, and it works fine as well.

So, my conclusion is: The problem is on starwind side, which don't handle simultaneous connection to multiple ip well.

Tue Sep 20, 2011 10:14 am

If it would be always true we would not have proper scaling coefficients with other configurations. So I'm not so sure... But we're working on a new async kernel in any case.

sunyucong wrote:Teaming does work somewhat well enough, I would really want to understand why MPIO fails for me.

A interesting observation I made on the server, TCP SegmentRetrans is very very high when using MPIO mode, after observe that I think the problem is in network layer, I switched to teaming and Retrans is lowered to very minimum, no wonder the latency has decreased to 10~30ms.

And then, I once again create two port on esxi side, and doing multipather there, so each esxi is having two identical path to 1 4port bounded team on the server side, and it works fine as well.

So, my conclusion is: The problem is on starwind side, which don't handle simultaneous connection to multiple ip well.

Tue Sep 27, 2011 7:55 am

hi sunyucong,
Could you please tell me what load balancy policy was used for your image device(Round Robin, Fixed or Most Resently Used)?

Wed Sep 28, 2011 10:27 am

Also we`d like to ask you to provide us with detailed step-by-step description of what actions were performed before you faced that issue. Thanks!