Just can't figure this out

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Post Reply
craggy
Posts: 55
Joined: Tue Oct 30, 2012 3:33 pm

Sun Jul 28, 2013 11:42 pm

We built a brand new 10Gb SAN to replace an older 1Gb one and just can't figure out what is going on with the performance.

Setup is like this:
Hp C7000 chassis with bl460c blades each with 2x quad cores and 32GB ram running esxi 5.1 on VC Flex-10 10Gbe nics connecting back to Starwind latest version running on HP Dl360 G5 with a HP P812 controller connected to a HP MSA70 running 25x 300GB SAS 15k disks and a HP NC522sfp dual port 10Gbe nic connecting back to the Flex-10 modules via HP .5m SFP+ DAC cables. Starwind is running on Server 2008 R2 and all lates updates installed.

In theory the server 2008 R2 test VM running on one of the blades should be able to push 800MB - 900MB+ connecting back to the Starwinnd SAN but this is not the case. Best case scenario is about 550MBs write and 450MBs write.

Local performance tests directly on the SAN yield about 1.2GBs write and 1.6GBs read using both Hdtune 5.0 and Atto tools.

So I then created an 8GB ramdrive on the SAN and ran the same benchmarks and got around 3.2GBs write and 2.4GBs read.
Then I shared out this ramdrive to the ESXi hosts using Starwind and guess what, tests only get about 600MBs write and 500MBs read.
I switched to an RDM in vmware instead of a datastore but it yielded no improvement.

Thinking there was something wrong with my network config I went back to the starwind server and using Microsoft iScsi initiator I connected to the 8GB ramdrive being shared out from Starwind. assuming this would benchmark out at pretty much full 10Gbe speed I was disappointed to only see about 400MBs write and 350MBs read (please note this is MS initiator running on the same server as Starwind and the ramdrive).

So then I thought it was the HP Nc522SFP adapter or driver causing the problem, I disabled the Nic and installed the MS loopback adapter. Surely this would give me max performance but it was worse than the 10Gbe Nic, only seeing performance of about 350MBs write and 300MBs read. Weird.

So what on earth could be going on here?

I followed all the recommendations for performance tuning like enabling/disabling RSS, Chimney, TCP offloading, Delayed Ack, etc. but no major difference overall.

Where do I go from here?
User avatar
anton (staff)
Site Admin
Posts: 4021
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Mon Jul 29, 2013 10:59 am

Run NTtcp and IPerf to check how far your brand new 10 GbE gear can go with TCP. If you cannot make more then 600 MB/sec with an acceptable latency then no way iSCSI being using TCP can go faster.
So TCP and network gear would be the thing to optimize / re-install / re-build etc. Please run the tests and share results with us. Thanks!
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
craggy
Posts: 55
Joined: Tue Oct 30, 2012 3:33 pm

Mon Jul 29, 2013 11:33 am

Thanks, i'll run some TCP tests now.

In the mean time, any idea why even using the loopback adapter directly on the SW San connecting to its own ram drive would be so slow?
craggy
Posts: 55
Joined: Tue Oct 30, 2012 3:33 pm

Mon Jul 29, 2013 2:32 pm

Ok, so I tested iperf from the VM to the Starwind server and am seeing about 2.3Gb/s with only moderate CPU load showing on either server.

I then tested iperf on the loopbacl ip 127.0.0.1 directly on the Starwind server and am getting exactly the same results.

I would have expected to see 30 - 40Gbs in the results as im only really bound by the CPU and memory bandwidth on a loopback test.

Edit: I've run iperf on my own pc for comparison, its a core i7 with 16GB DDR3 1333 running Win 7 and results on a loopback test are around 4.2Gbits/sec. Am I doing something wrong?
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Mon Aug 05, 2013 10:43 am

Tricky question, actually. When running network benchmarkking tests the goal is to achieve ~95% of your throughput. You need to change the input of the iperf (or NTttcp) untill you`ll get the numbers that you need, otherwise you can consider low performance rates as the malfunction of some network part (it can be NIC, Switch,cable or even motherboard sometimes.
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
dwright1542
Posts: 19
Joined: Thu May 07, 2015 1:58 am

Mon Jan 11, 2016 2:00 am

Did you ever fix this? We are running into the same exact situation.
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Mon Jan 11, 2016 12:35 pm

Hi!
Mostly it`s not about the fix, but about the proper benchmarking. As far as I can see from our record you have the case opened and my colleague Oles assists you, so looks like you are in the good hands :)
I will check with Oles regarding your issue though.
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
Trevbot
Posts: 12
Joined: Sun Mar 08, 2015 8:59 pm

Tue Jul 26, 2016 4:48 pm

Did you guys ever figure this out? I have a similar issue. Testing the SAN Array locally pushes 2GBps + on both read and write but connecting from an ESXi 6.0u2 host over 10Gbe I can only get about 250MB-300MB read and write.
dwright1542
Posts: 19
Joined: Thu May 07, 2015 1:58 am

Tue Aug 09, 2016 1:37 pm

Trevbot wrote:Did you guys ever figure this out? I have a similar issue. Testing the SAN Array locally pushes 2GBps + on both read and write but connecting from an ESXi 6.0u2 host over 10Gbe I can only get about 250MB-300MB read and write.
Not as of recently. Higher bandwidth and IOPS are both an issue. Apparently there is a core problem / incompatibility with Windows that is the limiting factor. There's another thread about this issue:

https://forums.starwindsoftware.com/vie ... f=5&t=4514

However, all 3 of the 3rd party vSAN vendors have speed issues. The only solution that really worked was shutting off all cache, and using Pernix at the front end, or actually using VmWare vSAN which is a whole different price level.
Al (staff)
Staff
Posts: 43
Joined: Tue Jul 26, 2016 2:26 pm

Thu Sep 08, 2016 10:43 pm

Hello dwright1542,

Thank you for your respond here :D

We are working hard on this. We are going to have significant improvements in upcoming builds :)
Post Reply