Fixed path faster than round robin

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

paulow1978
Posts: 16
Joined: Fri May 11, 2012 3:04 pm

Fri May 11, 2012 3:11 pm

Hi,

I have just started testing starwind out and I am having quite strange results at the moment.

I have ESXi 5.0 installed on a Dell R710, it has multiple nics in configured with jumbo frames going through an Avaya/Nortel switch stack on a sperate VLAN to all normal data traffic.

ESXi is configured with two vmkernels linked to one vswitch for iscsi and I have done the port binding to link eack vmk to a seprate nic as per the best practise. I have got an Equallogic using round robin successfully and it saturates these two connections around the 200 mbps mark.

My starwind server has windows 2008 r2 and a quad port intel nic configured for jumbo frames. I have configured a LUN and served it up to my ESXi server. At thyis point I had not set the round robin path policy and I created a new hard disk on this lun on a test vm. My iometer results showed it was saturating the connection at around 117mbps (so all good). I then went on to the vmware command line and sorted out my round robin poliy to change every 8800 bytes and retested. My iometer results were around the 40mpbs mark. Strange. SO i thought I would then loko up starwinds recommendations on round robin and saw you recommend using iops at 1 (or a 2 or 5). I changed my path policy for the lun on ESXi and retested. Again, poor performance.

I started to wonder at this point if this is due to the delayed ack issue or an issue with a setting on my intel nics.

please help!

many thanks,

Paul
User avatar
Max (staff)
Staff
Posts: 533
Joined: Tue Apr 20, 2010 9:03 am

Mon May 14, 2012 8:13 am

Hi Paul,
Could you please tell me what were the tests you performed in IOmeter: writes, reads, or both?
Which of them shows the degradation if MPIO is enabled?
Also, have you already disabled the delayed ACKs?
Max Kolomyeytsev
StarWind Software
paulow1978
Posts: 16
Joined: Fri May 11, 2012 3:04 pm

Mon May 14, 2012 8:18 am

sorry just posted at the same time as you!, will answer your questions in a sec


Hi,

Is there any more information I can give to get some help withi this?:

some more info if it helps:

the nics I am using on the starwind server are Intel and I have updated to the latest drivers. I have 12x2TB sata drives in RAID 10 on the starwind server so it should ok. The fact that an iometer test on a 1GBPS link gets close to line speed using iometer but not when using roundrobin indiocates that there is defintely something amiss. I know the ESXi confiog is ok as I am using the same iscsi virtual networking for my equallogic and that produces 200mbps on roundrobin. I have set the delayed ack on ESXi and the registry on the starwind server and rebooted. I have reconfirmed all my links are using jumbo frames and the switch is configured for jumbo frames.

I don't know where esle to look to try and diagnose the issue,

any help would be greatly appreciated!

many thanks,

Paul
paulow1978
Posts: 16
Joined: Fri May 11, 2012 3:04 pm

Mon May 14, 2012 8:40 am

Max (staff) wrote:Hi Paul,
Could you please tell me what were the tests you performed in IOmeter: writes, reads, or both?
Which of them shows the degradation if MPIO is enabled?
Also, have you already disabled the delayed ACKs?

Hi Max,

The iometer tests I used were from the unofficial storage performance thread on vmware. There are 4 tests:

1.) 100 % sequential read
2.) 65% read 35% write (40% sequrntial, 60% random)
3.) 50% read 50% write (100% sequential)
4.) 30% write 70% read (100% random)

They all show degredation: here are my results for 1gbps link and then round robin after:

iops read iops write iops mbps read mbps Write mbps
3836.686959, 3836.686959, 0, 119.896467, 119.896467, 0
1901.241573, 1233.596455, 667.645118, 14.85345 9.637472, 5.215977
1899.549811, 949.89021, 949.659602, 59.360932, 29.684069, 29.676863
2886.190862, 2019.934259, 866.256603, 22.548366 15.780736 , 6.76763

iops read iops write iops mbps read mbps Write mbps
1260.343565, 1260.343565, 0, 39.385736, 39.385736, 0
2290.221652, 1492.063525, 798.158127, 17.892357, 11.656746, 6.23561
1329.749067, 665.618105, 664.130962, 41.554658 , 20.800566, 20.754093
2874.89778, 2013.092106, 861.805674, 22.460139, 15.727282, 6.732857

Yes i have done the delayed ack change..

thanks!
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Mon May 14, 2012 3:13 pm

May I wonder have you tested all the paths to hte target separately? I means you have, lets say two paths, when running tests with Fixed NLB policy it is using path 1. Can you try to benchamark using only path 2?
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
paulow1978
Posts: 16
Joined: Fri May 11, 2012 3:04 pm

Mon May 14, 2012 3:41 pm

good call Anatoly!

I will try that this evening....
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Tue May 15, 2012 3:58 pm

Any updates please?
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
paulow1978
Posts: 16
Joined: Fri May 11, 2012 3:04 pm

Wed May 16, 2012 8:21 am

Hi again,

I tested out both links seperately on fixed and they are both fine.

Its very strange. When i use round robin if I used the default path policy in VMware with 1000 iops I see around 108mbps across both links. If I change the policy to 1,2,5,10 iops it drops to around 40mbps.

I have followed the thread on tcp performance tuning and esxi initiator configuration. I am starting to wonder if it is something to do with the Intel dual port gigabit nic on the starwind server. I have the latest drivers on there. When I first instaleld the nics the server was freezing up for a second here and then unfreezing repeatedly. Changed the drivers and then it was ok again. I wonder if these intel nic drivers are dodgy!

I might try to go back to a previous version of the drivers and retry. Apart from that I am stumped. I have been looking at this issue now for three days and I am normally pretty good when it comes to vmware (I am a VCP) and iscsi.

If I find an answer I will post back!

thanks,

Paul
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Wed May 16, 2012 10:48 am

I cannot disagree with you - looks strange.
Do you have the same NICs on the StarWind box?
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
paulow1978
Posts: 16
Joined: Fri May 11, 2012 3:04 pm

Wed May 16, 2012 1:01 pm

well, one thing that was an issue was that I was using a broadcom nic on the ESXi host for one of my vmknics on the iscsi switch doh! (this may/may not have been a problem)

I rebuilt it correctly now so that all the nics on both the starwind server and ESXi are using the intel nics. I am still getting the same performance issues.

I have since I last posted also used wireshark to look at the network traffic when it is on one path compared to when it is on MPIO. I don't really understand what i am looking at but when on MPIO there are a lot of messages surrounding PDU reassembly. When on just one link with fixed, its all very clean in the results.

not really sure where to go from here. I might try some slightly older drivers from intel and see how that goes.....
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Fri May 18, 2012 2:31 pm

In one of your posts you mentioned next:
paulow1978 wrote: When i use round robin if I used the default path policy in VMware with 1000 iops I see around 108mbps across both links.
Can you confirm that I understood this correct: when you have this setting you are getting ~94% of network utilization when using Round Robin NLB policy to StarWind?

Ok, lets also wait for you test results with old drivers.
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
paulow1978
Posts: 16
Joined: Fri May 11, 2012 3:04 pm

Fri May 18, 2012 3:55 pm

Hiya,

What I mean is:

In round robin I see about 10-30% network utilization on each nic on the starwind server (iometer shows 40-60mbps on sequnetial read)
in fixed path using one link I see 90%+ (110mbps on sequential read in iometer)

I have spent a lot of time on this, messing around with the intel nic settings. After banging my head against a brick wall and running circa 40 iometer tests, I eventually disabled jumbo frames on the intel nics and the round robin performance has gone to 110mbps and sometimes up to 135mbps so it is looking much better. My switches are avaya/nortel 4500GT and have jumbo frames enabled. In fact I am using them on my iscsi vlan successfully with my equallogic and vmware so this issue seems localised to the windows starwind server (though looks now very much like a bad implementation of jumbo frames on this set of intel drivers).

Sorry to hassle you guys. It's not a starwind issue. i am prettu certain of that. I am only posting on here in the hope that someone else might know what the issue is being caused by, hope thats ok!

thanks,

Paul
User avatar
anton (staff)
Site Admin
Posts: 4021
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Fri May 18, 2012 3:57 pm

Is there any chance you can connect bypassing switches? Direct cables? Do you have Jumbo frames enabled for EQL config?
paulow1978 wrote:Hiya,

What I mean is:

In round robin I see about 10-30% network utilization on each nic on the starwind server (iometer shows 40-60mbps on sequnetial read)
in fixed path using one link I see 90%+ (110mbps on sequential read in iometer)

I have spent a lot of time on this, messing around with the intel nic settings. After banging my head against a brick wall and running circa 40 iometer tests, I eventually disabled jumbo frames on the intel nics and the round robin performance has gone to 110mbps and sometimes up to 135mbps so it is looking much better. My switches are avaya/nortel 4500GT and have jumbo frames enabled. In fact I am using them on my iscsi vlan successfully with my equallogic and vmware so this issue seems localised to the windows starwind server (though looks now very much like a bad implementation of jumbo frames on this set of intel drivers).

Sorry to hassle you guys. It's not a starwind issue. i am prettu certain of that. I am only posting on here in the hope that someone else might know what the issue is being caused by, hope thats ok!

thanks,

Paul
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
paulow1978
Posts: 16
Joined: Fri May 11, 2012 3:04 pm

Fri May 18, 2012 4:10 pm

hiya,

yes EQL has jumbo frames enabled. All good on that front

I will try the direct connection. Just need to find some straight throughs....
User avatar
anton (staff)
Site Admin
Posts: 4021
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Fri May 18, 2012 7:42 pm

Kind of strange... OK, please check other connections (no switches) and let us know. Thank you!
paulow1978 wrote:hiya,

yes EQL has jumbo frames enabled. All good on that front

I will try the direct connection. Just need to find some straight throughs....
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
Post Reply