VMware VM's disk performance slow, but others fast

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

jmchristy
Posts: 37
Joined: Thu Mar 15, 2012 2:55 pm

Tue Apr 30, 2013 5:13 pm

Got an odd one here, hopefully it's something simple!

I've been able to achieve 300-400MBps disk performance using ATTO to benchmark my servers. I recently discovered that some systems are not performing this way. Oddly enough, they are all Windows 2008 and 2008 R2 VMs that have slow disk access.

I have virtual machines running on the same host, storage on the same SAN, but one will have really fast disk access and utilize almost 90-95% of the bandwidth on my SANs when running ATTO but another VM barely hits 20%.

I have my VMware iSCSI network setup per the Vmware document here (page 89)
http://pubs.vmware.com/vsphere-51/topic ... -guide.pdf
I have my iops set to 1, round robin turned on, jumbo frames enabled - all that stuff is in place. Just not sure why some VM's run poorly while others don't.

I was thinking it was the SCSI Controller type, since they all don't match but I switched one of the VM's controller types from LSI Logic SAS to LSI Logic Parallel and it didn't improve the disk speed.

Is there something else I should be checking?

I'm running Starwind 6, 2TB HA Enterprise, configuring following this white paper (with the exception of page 89 on the VMware whitepaper.
http://www.starwindsoftware.com/images/ ... ere_v6.pdf


Any help is appreciated, thanks!
User avatar
anton (staff)
Site Admin
Posts: 4021
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Tue Apr 30, 2013 9:51 pm

Mapped as VMDK or iSCSI from inside a Windows VM? VMDK fixed or growable? OS aligned?
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
jmchristy
Posts: 37
Joined: Thu Mar 15, 2012 2:55 pm

Wed May 01, 2013 11:50 am

It's a VM running as iSCSI off ESXI 5 with the latest updates/patches. Both hard disks are Thick Provision Lazy Zeroed.

I do not know if they are OS aligned, first time I've seen that term used. How can I verify this?
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Tue May 14, 2013 4:14 pm

I need to know the following information from you in order to provide a solution:
1. Operating system on servers participating in iSCSI SAN and client server OS
2. RAID array model, RAID level and stripe size used, caching mode of the array used to store HA images, Windows volume allocation unit size
3. NIC models, driver versions (driver manufacturer/release date) and NIC advanced settings (Jumbo Frames, iSCSI offload etc.)
4. Network scheme.
Also, there is a document for pre-production SAN benchmarking:
http://www.starwindsoftware.com/starwin ... ice-manual
And a list of advanced settings which should be implemented in order to gain higher performance in iSCSI environments:
http://www.starwindsoftware.com/forums/ ... t2293.html
http://www.starwindsoftware.com/forums/ ... t2296.html

Please provide me with the requested information and I will be able to further assist you in the troubleshooting
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
jmchristy
Posts: 37
Joined: Thu Mar 15, 2012 2:55 pm

Fri May 24, 2013 8:42 pm

I'm currently working with VMware on this to troubleshoot the issue as well. They had me build up a new VM, with a different SCSI controller but that did not make it any better.

1. Operating system on servers participating in iSCSI SAN and client server OS
The SAN's are running on Windows 2008 R2, I am running Windows 2008 x64, and Windows 2003 x32 on the guest VMs
2. RAID array model, RAID level and stripe size used, caching mode of the array used to store HA images, Windows volume allocation unit size
Perc H700, RAID 10, strip size 64kb, write-back caching 2GB, allocation size of both hosts is default 4096
3. NIC models, driver versions (driver manufacturer/release date) and NIC advanced settings (Jumbo Frames, iSCSI offload etc.)
SAN NIC models, Intel Gigabit ET Quad Port driver 11.17.27.0, Jumbo frames enabled @ 9014, no iSCSI offload
4. Network scheme.

I have actually posted here in the past with similar problems that I eventually discovered the issue. I have the VMware servers and SAN configured per the best practices guides, using Jumbo Frames, iops 1, round robin.
In the network diagram, I only have 2 paths from each ESXI host going to the SAN's. IE from one my hosts, i only have added to the dynamic discovery 10.0.0.32, 33 10.0.0.41,.43 or else I would overload the SANs.

It's odd to me that I can achieve such fast speeds from the older Windows 2003 server and such slow speeds from the newer Windows 2008 x64 VM when they are hitting the same SAN, from the same host.
Attachments
Network scheme
Network scheme
Hellam ESXi Topology.jpg (200.25 KiB) Viewed 29099 times
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Mon May 27, 2013 9:02 am

Quick question - what StarWind version are you running? Is it up to date?
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
jmchristy
Posts: 37
Joined: Thu Mar 15, 2012 2:55 pm

Tue May 28, 2013 2:36 pm

I am running StarWind version 6.0.4768
jmchristy
Posts: 37
Joined: Thu Mar 15, 2012 2:55 pm

Wed May 29, 2013 2:42 pm

Just an update - i've tried a couple of things with no luck.

Manually changed the SCSI controller of the Windows 2008 VM from LSI Logic Parallel to LSI Logic SAS (also tried using VMware Stand Alone converter to change it)
Moving the VM to a new host, and a different SAN
Moving the VM to a host that is running ESXi 5.0, build 469512 and testing it on ESXi, build 914586. Each host has a different network configuration due to a known issue when a link goes down and comes back online. (see attached)
Ensured both RoundRobin, IOPS=1, and jumbo frames are enabled (which is proven by my 2003 VM running so quickly) - have a lengthy forum post already here with what I went thru to achieve optimal settings http://www.starwindsoftware.com/forums/ ... t2910.html

I can't attach more than 3 pictures, or anything larger than 256KB (what's that about, it's 2013!) so i'll just add the screenshot of my SAN's network utilization when running an ATTO from the Winodws 2008 VM - it's utilizing less than 1/4 of each NIC it's programmed for. However, when I do the ATTO benchmark from a Windows 2003 VM on the same host - accessing the same SANs - over the same NICs, those utilization numbers are between 95% and 99% when I run the ATTO benchmark utility.

I only have one ESXi host running on that old build for 5.0 because it's running our ERP system locally - and I haven't been able to schedule downtime just yet to stage the patches and remediate.

I am currently building a new ESXi host, on newer hardware, and will try it from there (fully patched) to see if the outcome is any different. I have to order some hardware for it to get the gigabit pathways to the SAN to test, so it may be a little while until I can post what happened.
Attachments
Windows 2008 VM SAN iSCSI network utilization
Windows 2008 VM SAN iSCSI network utilization
SAN netutiliz win2008 vm.JPG (158.87 KiB) Viewed 29009 times
VMware 4695152 network config
VMware 4695152 network config
build 4695152 iscsi network.JPG (40.62 KiB) Viewed 28994 times
VMware 914586 network config
VMware 914586 network config
build 914586 iscsi network.JPG (28.47 KiB) Viewed 28992 times
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Fri May 31, 2013 4:41 pm

Basing on the fact that only one VM is running slowly while others are fast I`ll allow myself to assume that the problem is not related to StarWind software. Nevertheless we are interested in getting the issue solved. May I ask you if you had any success with VMware support working on that case?
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
jmchristy
Posts: 37
Joined: Thu Mar 15, 2012 2:55 pm

Tue Jun 04, 2013 7:31 pm

I have not had much luck working with VMware thus far, they keep pointing me towards it being a SAN issue. The only things they have tested so far are moving it to local storage, which immediately improved the performance. The other thing they tried was changing the SCSI controller with no luck, and also build a new Windows 2008 R2 with the default settings from VMware - and it stlll performed the same way.

I'm sorry if I gave you the impression it was only 1 virtual machine performing poorly, it's all Windows 2008, and 2008 R2 servers. I have 4 separate VM's, 3 of which are on 3 separate ESXi hosts, and 3 different SAN HA targets.

All of my 2003 servers, are performing great. I'd love to do a remote session with someone at Starwind to take a closer look.
jmchristy
Posts: 37
Joined: Thu Mar 15, 2012 2:55 pm

Thu Jun 06, 2013 2:38 pm

Attached are examples, one is a 2003 VM, the other is a 2008 VM. Both on the same host, same SAN. I have tried changing hosts, and SAN's, with no luck.

I have 7 Windows 2008/2008 R2 VM's and all of them behave the same

VMware has dismissed it as being an issue with ESXi and told me to contact my SAN vendor.
Attachments
2008 VM.JPG
2008 VM.JPG (61.76 KiB) Viewed 28868 times
2003 VM.JPG
2003 VM.JPG (65.41 KiB) Viewed 28867 times
jmchristy
Posts: 37
Joined: Thu Mar 15, 2012 2:55 pm

Mon Jun 10, 2013 11:36 pm

Ok, so to update...

I removed the 2 IP addresses of my partner SAN in the VMware iSCSI software initiator (removed from dynamic and static tabs) and rescanned.
I ran ATTO and the WRITE results were identical, however the READ results were about 50% at 2048, 4096, and 8192 test results which is what i'd expect. Wouldn't the WRITE results go down by 50% if it were truly using those paths?

I figured i'd try the benchmark tests on my 2003 VMs as well and when I did, the WRITE results were not what I expected. I thought they would be 50% of when I have all 4 paths enabled, but it wasn't. It went no higher than 150MB, and at one benchmark (4096) it was around 20. When I added all 4 paths back - the 2003 VM performed at 450MB+ for all 4 benchmark tests i'm running (1024, 2048, 4096, 8192) The only one I really care about is the 4096 since that's what all my VM's are formatted at.

I'm at a loss, any recommendations? Reboot my partner server? Reboot my ESXi hosts? What's going on?
User avatar
Max (staff)
Staff
Posts: 533
Joined: Tue Apr 20, 2010 9:03 am

Tue Jun 11, 2013 10:17 am

Hi Jmchristy,
Is there a chance we could do a remote session?
I'd like to have a closer look at the configuration and do few more tests.
I've emailed you my contact details, please send me an e-mail once you have time to jump on a remote session.
Max Kolomyeytsev
StarWind Software
jmchristy
Posts: 37
Joined: Thu Mar 15, 2012 2:55 pm

Tue Jun 11, 2013 4:52 pm

Just an update.

I had a remote session with Max, and he created a RAM drive that performed at 1GB speeds (a little over 120MBps on ATTO)

The next step i'll take in troubleshooting this is to build the network per the best practices guide found here (page 6)
http://www.starwindsoftware.com/starwin ... ces-manual

Currently, everything communicates over 10.0.0.x on my network adapters. The best practices guide recommends creating different subnets for each path.

So...I have 2 spare network cards on my one host, and 2 on the SAN. I will create a path from the host to the SAN and create a different subnet for each path. I'll create a new .IMG file and transfer a VM to that new target and see the results. It should be between 225-250MBps if all is well.

Max - if there was anything else you wanted to add or if I missed something I was supposed to try please post. Appreciate the help!

P.S. - I should be able to perform this test this week, if not - I'll be out of the country next week and won't return until the week of the 23rd to post the results
jmchristy
Posts: 37
Joined: Thu Mar 15, 2012 2:55 pm

Wed Jun 12, 2013 4:25 pm

Another update.
  • On my ESXi host, I added more 2 VMkernels. I enabled jumbo frames
    100.0.1.1, 100.0.2.1 are the IPs
    On my SAN, I added an image file, enabled jumbo frames on the network cards. I also added an Access Rights rule to prevent my main iSCSI network in the 10.0.0.x network from accessing this image file when I add it to VMware
    100.0.1.2, 100.0.2.2 are the IPs
    I added the 2 VMkernels to my iSCSI port bindings and under Dynamic Discovery I added 100.0.1.2, 100.0.2.2. I verified that VMware is only connected to those addresses in that subnet.
    I turned on Round Robin, and under manage paths it lists 2 paths to 100.0.1.2, and 100.0.2.2 (both say Active). I set iops to 1 on the storage after it was added.
I transferred my Windows 2008 VM to this 60GB storage device, and ran an ATTO test. I ran this test on my existing switch stack, and a spare HP ProCurve switch and I received the same results on both. You can see in the screenshot, i'm hitting 50MB, and both my NIC's that I added on the right are only at around 24% utilization while my main iSCSI network is not active.

Ok - well maybe my Windows 2003 VM will behave the same. So I shutdown the 2008 VM and moved it back to it's regular host/SAN and transferred a 2003 VM and ran the ATTO test. Those results are also attached. You can see that I'm achieving gigabit speed, and my 2 NICs are fully utilized between 96% and 99% during the tests. I was able to achieve this on my Avaya switch stack and on the HP ProCurve spare switch I have.

So I believe this way, I now have it setup using the best practices method - however I am still seeing the same behavior. I also tested this on an entirely different host, and SAN that I have previously tested on - although they are the same hardware I just thought I'd throw that out there. I think I'll open up another trouble ticket with VMware and see what they think.
Attachments
Windows 2003 VM performance results ATTO
Windows 2003 VM performance results ATTO
perftest results - 2003 VM.JPG (148.92 KiB) Viewed 28680 times
Windows 2008 VM performance results ATTO
Windows 2008 VM performance results ATTO
perftest results - 2008 VM.JPG (139.04 KiB) Viewed 28674 times
Post Reply