Baffling Performance Issues

EinsteinTaylor · Wed Mar 13, 2013 10:56 pm

I am in need of help, and not sure what to do at this point. I have been banging my head against the wall all day with performance issues. Right now I have 2 nodes in HA on Starwind. Each with 4 x 1GB NICS. 1 is for management, 1 is for sync, and 2 are the addresses actually being used for the iSCSI. I currently have Write Back caching turned on in the iSCSI target and jumbo frames all the way through. The physical layout looks like:

: starwindLayout.png (10.48 KiB) Viewed 9560 times

If I use fixed path path management I get basically wire speeds for gigabit as seen below.

: fixedPath.png (42.46 KiB) Viewed 9556 times

But if I use MPIO and set iops = 1 I get:

: iops1.png (47.36 KiB) Viewed 9557 times

As you can see some multi pathing is happening as I do manage to get a speed of 200MB/s once, but it's just all over the place and I don't know why.

The forums will only allow me to attach 3 screenshots, but the benchmark on the physical servers is coming in around 3.4GB/s so certainly not the bottle neck there.

Both Starwind boxes are Windows 2008 with the following set based on the starwind forums:
netsh int tcp set heuristics disabled - disables Windows scaling heuristics
netsh int tcp set global autotuninglevel=normal - turns-on TCP auto-tuning
netsh int tcp set global congestionprovider=ctcp - turns on using Compound TCP
netsh int tcp set global ecncapability=enabled - enables ECN
netsh int tcp set global rss=enabled - enables Receive-side Scaling (RSS). Note, that you should enable it only if your NIC supports it!
netsh int tcp set global chimney=enabled - enables TCP Chimney offload
netsh int tcp set global dca=enabled - enables Direct Access cache (should be supported by CPU, chipset and NIC)

I am running RAID 50 with a 64K stripe size and the Windows disk on the host is formatted with a 64K allocation unit size(the biggest you can get).

No tweaks other than jumbo frames have been applied to VMWare, and I have even tried removing the Cisco switch stack and putting in an HP procurve and the results don't change so I'm pretty confident it's not the switch.

Please help.

EinsteinTaylor · Wed Mar 13, 2013 11:07 pm

A few more notes. First, just in case there is doubt, below is the performance when ran right on the starwind server:

: physical.png (26.13 KiB) Viewed 9553 times

I'm starting to think the issue might be Jumbo Frames. I returned the MTU in ESX to 1500 and disabled jumbo frames on the Starwind server. They are still enabled on the switch as that requires a reload which I'm not willing to do now, but the switch should just forward the smaller frames right along. Anyways, the performance with Jumbo Frames off is somewhat better but still not stellar.

: noJumboFrames.png (47.39 KiB) Viewed 9551 times

EinsteinTaylor · Thu Mar 14, 2013 9:38 pm

I believe I finally figured it out but thought I would document it here for others that could use the help. I had been using 1 /27 subnet for all network adapters in the toplogy. I noticed when I went to look at the multipath options that it was detecting something like 72 paths to one target which seemed weird. I went back and gave each NIC on the ESX server it's own VLAN and /29 Subnet. I then put matched up the NICS from each node(remember, 2 NICS per Starwind Host for a total of 4) with a 1:1 relationship with the ESX server. After retesting I'm now seeing:

: subnetsNoJumbo.png (51.57 KiB) Viewed 9491 times

The writes aren't fantastic, but I figure that's probably either part of the RAID50 penalty, or maybe the writes don't multipath the same way the reads do. Regardless, much better now.

EinsteinTaylor · Fri Mar 15, 2013 6:19 pm

Now we're where I want to be. One of the support techs mentioned something in an email about sync channels and that clicked my brain why my writes were so much slower than my reads. I added more sync channels and now you will see exactly what you would expect from a 4 x NIC RAID 50 solution.

I hope in the near future to write up a complete set of notes on this to be a single point of reference for people, because while there is a good amount of information in this forum and with the staff, it doesn't seem to be centered in one place.

: syncChannels.png (51.46 KiB) Viewed 9439 times

eickst · Sun Mar 17, 2013 12:49 am

By sync channels do you mean just dedicating more nics to sync? Or is there some magical setting somewhere?

I'm getting turned off by the performance I am getting out of this software. If I run with raw benches straight to disk my arrays fly but starwind kills my performance to 25%. I don't mean a 25% hit, I mean performance becomes 25% of what it was raw.

That's just a disk image, ha device is even worse. And forget deduplication.

Basically all the stuff on the website that makes you want to buy the product is all the stuff that doesn't seem to work too well (or is experimental)

Mon Mar 18, 2013 2:23 pm

By sync channels do you mean just dedicating more nics to sync? Or is there some magical setting somewhere?

>>> This is more a rule for HA SAN - the total sync channel throughput shouldn't be less than the toal throughput of the client iSCSI connections.
E.g. if your ESX servers are splitting load between 4 NICs(each) connected to the HA SANs - the sync channel has to be at least 4 GbE.