10GB iSCSI has very high latency

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Post Reply
jtwaddle
Posts: 2
Joined: Sat Oct 25, 2014 9:55 am

Sat Oct 25, 2014 10:19 am

I have a iscsi server setup with the following configuration

Dell R510
Perc H700 Raid controller 1GB cache
raid 10
Windows Server 2012 R2
Intel Ethernet X520 10Gb
12 near line SAS drives
I am currently running the latest version of starwinds free version 8.0.7145

I have connected it to a HP 8212 10Gb port which is also connected via 10Gb to our vmware servers. I have a dedicated vlan just for iscsi and have enabled jumbo frames on the vlan.

I frequently see very high latency on my iscsi storage. So much so that it can timeout or hang vmware.

There are only three VM's on this iscsi datastore. I am running two vm's running VMware Data Protection Manager and one vm that is just running windows that have backups pushed to it throughout the day. I don't need amazing performance for this, but I expect better then I am getting.

I am trying to determine why I see such high latency 100'ms. Perfmon shows Avg. Disk sec/Transfer counter for the physical disk around ~.02 to .01 during the high latency issues.

Any thoughts about any configuration changes I could make to my vmware enviroment, network card settings or any ideas on where I can troubleshoot this. I am not able to find what is causing it. I reference this document and for changes to my iscsi settings

http://en.community.dell.com/techcenter ... 03565.aspx

Here are some IO Meter results run directly on the physical machine which is running starwind:
iometerPhysical.PNG
iometerPhysical.PNG (17.78 KiB) Viewed 7982 times

Here are some from the VMware ESXi 5.5 VM which is connected over iSCSI
iometerVM.PNG
iometerVM.PNG (18.24 KiB) Viewed 7982 times

Why is there such a massive difference in the latency of the second test?

I have been doing some reading about using Storage Spaces instead of the Perc 700. Creating a Raid 0 for the 12 drives on the Perc H700 and disabling Percs cache and then creating a mirror using the 12 drives in storage spaces. Any have experience with that? Would that help with performance?

Thank you for your time.
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Thu Oct 30, 2014 11:42 am

There is obviously something wrong with the network part here. Have you tested it with something like iperf? If that is the case, then storage is not the problem.
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
jtwaddle
Posts: 2
Joined: Sat Oct 25, 2014 9:55 am

Fri Nov 07, 2014 9:44 pm

I have not. How do I run iperf between my vmware host and my Server 2012 R2 iscsi server?

Looking over the starwind logs I see this:

11/7 15:10:03.971 810 IMG: *** ImageFile_ScsiExec: SCSIOP (0x4D) is not supported.
11/7 15:10:17.268 354 IMG: *** ImageFile_ScsiExec: SCSIOP (0x4D) is not supported.
11/7 15:15:03.445 210 IMG: *** ImageFile_ScsiExec: SCSIOP (0x4D) is not supported.
11/7 15:15:03.461 210 IMG: *** ImageFile_ScsiExec: SCSIOP (0x4D) is not supported.
11/7 15:15:22.477 204 IMG: *** ImageFile_ScsiExec: SCSIOP (0x4D) is not supported.
11/7 15:20:03.810 204 IMG: *** ImageFile_ScsiExec: SCSIOP (0x4D) is not supported.
11/7 15:20:03.825 204 IMG: *** ImageFile_ScsiExec: SCSIOP (0x4D) is not supported.
11/7 15:20:21.669 354 IMG: *** ImageFile_ScsiExec: SCSIOP (0x4D) is not supported.
11/7 15:25:03.674 238 IMG: *** ImageFile_ScsiExec: SCSIOP (0x4D) is not supported.
11/7 15:25:03.690 238 IMG: *** ImageFile_ScsiExec: SCSIOP (0x4D) is not supported.
11/7 15:25:21.862 204 IMG: *** ImageFile_ScsiExec: SCSIOP (0x4D) is not supported.
11/7 15:27:33.474 b5c C[1], LIN: Send: semaphore is not signalled!
11/7 15:30:03.289 238 IMG: *** ImageFile_ScsiExec: SCSIOP (0x4D) is not supported.
11/7 15:30:03.305 238 IMG: *** ImageFile_ScsiExec: SCSIOP (0x4D) is not supported.
11/7 15:30:19.024 354 IMG: *** ImageFile_ScsiExec: SCSIOP (0x4D) is not supported.
11/7 15:35:03.732 354 IMG: *** ImageFile_ScsiExec: SCSIOP (0x4D) is not supported.
11/7 15:35:03.763 354 IMG: *** ImageFile_ScsiExec: SCSIOP (0x4D) is not supported.
11/7 15:35:21.201 354 IMG: *** ImageFile_ScsiExec: SCSIOP (0x4D) is not supported.
11/7 15:40:03.956 354 IMG: *** ImageFile_ScsiExec: SCSIOP (0x4D) is not supported.
11/7 15:40:03.987 354 IMG: *** ImageFile_ScsiExec: SCSIOP (0x4D) is not supported.
11/7 15:40:21.409 354 IMG: *** ImageFile_ScsiExec: SCSIOP (0x4D) is not supported.
11/7 15:44:09.663 b5c C[1], LIN: Send: semaphore is not signalled!
11/7 15:45:03.633 210 IMG: *** ImageFile_ScsiExec: SCSIOP (0x4D) is not supported.
11/7 15:45:03.648 238 IMG: *** ImageFile_ScsiExec: SCSIOP (0x4D) is not supported.
11/7 15:45:05.664 b5c C[1], LIN: Send: semaphore is not signalled!
11/7 15:45:05.680 b5c C[1], LIN: Send: semaphore is not signalled!
11/7 15:45:11.430 b5c C[1], LIN: Send: semaphore is not signalled!
11/7 15:45:13.242 b5c C[1], LIN: Send: semaphore is not signalled!
11/7 15:45:13.977 b5c C[1], LIN: Send: semaphore is not signalled!
11/7 15:45:14.102 b5c C[1], LIN: Send: semaphore is not signalled!
11/7 15:45:21.618 354 IMG: *** ImageFile_ScsiExec: SCSIOP (0x4D) is not supported.
11/7 15:45:34.446 818 T[1]: alloc a new task (active 128, free 0).
11/7 15:45:34.446 818 T[1]: alloc a new task (active 129, free 0).
11/7 15:45:53.618 b5c C[1], LIN: Send: semaphore is not signalled!
11/7 15:50:03.701 810 IMG: *** ImageFile_ScsiExec: SCSIOP (0x4D) is not supported.
11/7 15:50:03.716 810 IMG: *** ImageFile_ScsiExec: SCSIOP (0x4D) is not supported.
11/7 15:50:12.795 20c IMG: *** ImageFile_ScsiExec: SCSIOP (0x4D) is not supported.
11/7 15:53:30.392 b5c C[1], LIN: Send: semaphore is not signalled!
11/7 15:55:04.097 20c IMG: *** ImageFile_ScsiExec: SCSIOP (0x4D) is not supported.
11/7 15:55:04.112 20c IMG: *** ImageFile_ScsiExec: SCSIOP (0x4D) is not supported.
11/7 15:55:19.987 354 IMG: *** ImageFile_ScsiExec: SCSIOP (0x4D) is not supported.
11/7 16:00:04.055 354 IMG: *** ImageFile_ScsiExec: SCSIOP (0x4D) is not supported.
11/7 16:00:04.086 354 IMG: *** ImageFile_ScsiExec: SCSIOP (0x4D) is not supported.
11/7 16:00:13.180 354 IMG: *** ImageFile_ScsiExec: SCSIOP (0x4D) is not supported.
11/7 16:05:03.669 204 IMG: *** ImageFile_ScsiExec: SCSIOP (0x4D) is not supported.
11/7 16:05:03.669 204 IMG: *** ImageFile_ScsiExec: SCSIOP (0x4D) is not supported.
11/7 16:05:18.389 210 IMG: *** ImageFile_ScsiExec: SCSIOP (0x4D) is not supported.
11/7 16:10:03.706 204 IMG: *** ImageFile_ScsiExec: SCSIOP (0x4D) is not supported.
11/7 16:10:03.722 204 IMG: *** ImageFile_ScsiExec: SCSIOP (0x4D) is not supported.
11/7 16:10:17.581 20c IMG: *** ImageFile_ScsiExec: SCSIOP (0x4D) is not supported.
11/7 16:11:03.566 b5c C[1], LIN: Send: semaphore is not signalled!


Not sure what it means, but seems to happen a lot.
thefinkster
Posts: 46
Joined: Thu Sep 18, 2014 7:15 pm

Sat Nov 08, 2014 2:37 pm

As far as testing VMWare hosts; this is what VMWare says: http://kb.vmware.com/selfservice/micros ... 03#Network

In other words, you can't really test the host, but you can test a VM on the host, and then isolate the network portion with only physical (workstation/server, for example); then isolate the host resources (VM to VM); and compare results.

I can get 10 Gbps (9.93 to 10.03gbps) with iperf between physical systems (no switches, directly connected). That tells me the network side is working great; at least in that particular simulation of packet size/etc, TCP Window size/etc, thread count, etc.

If you REALLY need to show physical side (host) is working fine without VMWare involved, build a Windows PE disk with iperf on it, boot to it, then run your stuff on the host to verify network card is working properly. VMWare testing and tuning is a matter of isolating to the smallest set of involved hardware/software, then testing with various settings in a controlled fashion (one change at a time); and building out from there. Time consuming, but well worth the experience and the results.

None of this really tests drive speeds locally or over the network (iSCSI/SMB); and is JUST discussing making sure the network infrastructure has been tuned (frame size, etc).

As a side note, I don't have those kinds of errors in my Starwind logs.
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Mon Nov 10, 2014 1:50 pm

Thank you for the reply!
How do I run iperf between my vmware host and my Server 2012 R2 iscsi server?
From the Windows based VM. i perf was just an option, and if you know some other tool that is repliable and you find it more comfortable to work with - feel free to use it.


I need to know the following information from you in order to provide a solution:
1. Operating system on servers participating in iSCSI SAN and client server OS
2. RAID stripe size used, caching mode of the array used to store HA images, Windows volume allocation unit size
3. NIC models, driver versions (driver manufacturer/release date) and NIC advanced settings (Jumbo Frames, iSCSI offload etc.)
4. Detailed network diagram with all the IP addresses, network, storage components, their bandwidth/throughput and purpose included
5. StarWind build and version - if you are running not the latest one please update it.
5. What StarWind device types have you used in testing and their caching configuration.
Also, there is a document for pre-production SAN benchmarking:
http://www.starwindsoftware.com/starwin ... ice-manual
And a list of advanced settings which should be implemented in order to gain higher performance in iSCSI environments:
http://www.starwindsoftware.com/forums/ ... t2293.html
http://www.starwindsoftware.com/forums/ ... t2296.html
 
Please provide me with the requested information and I will be able to further assist you in the troubleshooting

Also, can I ask you to disable VAAI on hte ESX hosts and see if it will help?

Thank you
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
Post Reply