Round robin iops policy - peculiar effect

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Post Reply
starczek
Posts: 13
Joined: Mon Mar 05, 2012 1:30 pm

Thu Jul 19, 2012 11:37 am

Hi

I have two ESXi 5.0 hosts and one storage. Both ESXi hosts have 2 interfaces for iSCSI only and storage has 4. Every interface is configured to work in separate network segment. All connections are made using two separate HP V1910 (to get real failover) series switches without using jumbo frames (but results are the same when I use direct connections). My goal is to have failover and load balancing as well but... well... it works a bit in unexpected way. To clarify my issues I present to you two tests.

TEST 1
I log to ESXi host A (over SSH) and run following command:

Code: Select all

 dd if=/dev/zero of=10GiB_v1.dat bs=1M count=10240
Above command create 10GiB files filled with zeros on selected datastore. Creation speed depends purely on speed connection (and on RAID device on which StarWind's *.img file is placed but this is not the case here as RAID subsystem easily outperforms 2x1GbE wire speed) so I can test maximum throughput and if load balancing works well.

My expectations was something about 200 MB/s, but i get only ~90MB/s what can be seen on following screenshots.

First one shows 5 second average of write speed from esxtop. It ranges from 85 to 90 MB/s.
RR single 2.png
RR single 2.png (883 Bytes) Viewed 18019 times
This one presents what is happening with network cards - data transfer complement each other! When one is maxed out second one is equal to zero...
RR single.png
RR single.png (1.89 KiB) Viewed 18016 times
Third screenshot shows what is happening when the same command is issued when multipathing policy is set to fixed and during transfer changed to round robin. In the beginning one channel is used and that's expected. But when policy changes transfer is equally split between both paths but the sum remains same as for single path (1xGbE). I admit I expected to see that transfer speed doubles...
Change from FIXED to RR - one host.png
Change from FIXED to RR - one host.png (1.23 KiB) Viewed 18019 times
TEST 2
Finally I have done above test for two hosts simultaneously (starting with fixed policy and change during transfer policy to round robin on both hosts). [Unfortunately I cannot upload more then three attachments so I do this in the next post.] This test showed that two parallel transfers are possible without interfering each other (none of the dd command slowed down because of second transfer). But the effect known from one host experiment is exactly the same - instead of 4x1GbE I get 4x0.5GbE. After that I'm pretty sure that not the network is limit but ESXi round robin algorithm or StarWind Target method of handling round robin algorithm.

SUMMARY
In all tests round robin policy was set to "IOPS" mode and iops operations limit to change path set to 1.

What, the heck, is going on? In theory round robin works but why I obtain only half of the wire speed? ESXi is the most recent (5.0.0b768111), StarWind as well (5.8.2013). Is it networking problem (I doubt, switches logs do not show any conflicts, dropped packets etc.). Why I cannot get the aggregated wire speed? And I'm not sure where the problem in fact is - StarWind iSCSI Target or ESXi. Any help, comments, questions and discussion is appreciated.
Last edited by starczek on Thu Jul 19, 2012 11:50 am, edited 4 times in total.
starczek
Posts: 13
Joined: Mon Mar 05, 2012 1:30 pm

Thu Jul 19, 2012 11:38 am

Here is the fourth (missing one) screenshot when two ESXi hosts are involved.
Change from FIXED to RR - two hosts.png
Change from FIXED to RR - two hosts.png (1.68 KiB) Viewed 18029 times
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Sat Jul 21, 2012 11:32 am

OK, and can we see the diagram with the network load on the SAN servers? Also, when using RR NLB, what value d oyou have in IOPs?
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Mon Jul 23, 2012 12:57 pm

To be honest, for me it looks like you have specified only one NIC as destination on your ESX.
Would you please doublecheck if both NICs on ESX are connected to both StarWind NICs and configured to use RR?

Thank you
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
starczek
Posts: 13
Joined: Mon Mar 05, 2012 1:30 pm

Mon Aug 06, 2012 2:20 pm

Anatoly (staff) wrote:To be honest, for me it looks like you have specified only one NIC as destination on your ESX.
Would you please doublecheck if both NICs on ESX are connected to both StarWind NICs and configured to use RR?
Sorry for the late answer but was on holiday.

I've checked connections once more as you suggested but everything seems to be fine. My topology:
Topology.png
Topology.png (31.3 KiB) Viewed 17728 times
Next two screenshots present paths configuration for both hosts. You can easily see that target IP is different for each hba.
Paths - host A.png
Paths - host A.png (4.33 KiB) Viewed 17730 times
Paths - host B.png
Paths - host B.png (4.35 KiB) Viewed 17730 times
Anatoly (staff) wrote:[...]and can we see the diagram with the network load on the SAN servers?[...]
Anatoly, what do you exactly mean? Which diagrams you want to see? Those provided in my first post are from System Monitor (part of Windows Server 2008 R2) and you can see there all network traffic during my tests. Or perhaps you want those from iSCSI target?
Anatoly (staff) wrote:[...]Also when using RR NLB, what value do you have in IOPs?
Those from iSCSI Target I guess?
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Tue Aug 07, 2012 12:38 pm

Hi. Lets go through this one-by-one first.
Anatoly, what do you exactly mean? Which diagrams you want to see?
the ones that I have now is pretty enough for now.
Those from iSCSI Target I guess?
Yeap

And now about hte issue - well, to be honest for me it looks really weird. Please tell me, if this is production environment? Do you have possibility to try to reproduce this with, lets say, Microsoft target software?
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
starczek
Posts: 13
Joined: Mon Mar 05, 2012 1:30 pm

Fri Aug 10, 2012 4:46 pm

Two screenshots presenting what is happening on iSCSI target during my tests.

First one shows that starting dd on ESXi host B when ESXi host A is dding doubles number of IOPS. Both hosts are using RR policy.
iSCSI_target_IOPS_RR.png
iSCSI_target_IOPS_RR.png (13.29 KiB) Viewed 17671 times
Second shows what is happening when I changed MRU policy to RR during dd. This slightly increases number of IOPS.
iSCSI_target_IOPS_change_from_MRU_to_RR.png
iSCSI_target_IOPS_change_from_MRU_to_RR.png (9.37 KiB) Viewed 17667 times
I've also opened ticket with VMware because I'm feeling that problem is rather on VMware side than Starwind.
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Mon Aug 13, 2012 8:20 am

OK, I`ve got two oquestions for now:
1.Would you be so kind to keep us updated about the progress of the solving the issue from VMware side?
2.Are you able to replace starwind for testing purposes to MS iSCSI target? Or maybe just add it if you have some additional server identical to the one where starwind is running?
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
starczek
Posts: 13
Joined: Mon Mar 05, 2012 1:30 pm

Fri Aug 24, 2012 1:29 pm

Anatoly (staff) wrote:OK, I`ve got two oquestions for now:
1.Would you be so kind to keep us updated about the progress of the solving the issue from VMware side?
2.Are you able to replace starwind for testing purposes to MS iSCSI target? Or maybe just add it if you have some additional server identical to the one where starwind is running?
1) Sure. Currently VMware closed my ticket due to my holidays and my reopen requests are ignored for over 2 weeks...
2) I'll try to, but this is my production environment so all test must be done very carefully.

I've also noticed one more thing during cloning VM . Machine was cloned from datastore "LUN-5" to datastore "LUN-6". As one can see data written to target is doubled comparing source datastore. Miracle? Where the extra data came from? The same is observed on disk:
DISK_during_cloning.png
DISK_during_cloning.png (1.59 KiB) Viewed 17545 times
and iSCSI dedicated network interfaces:
NETWORK_during_cloning.png
NETWORK_during_cloning.png (1.24 KiB) Viewed 17547 times
.

No other activity or VMs on those two datastores. Frankly, I'm completely out of ideas... :?
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Mon Aug 27, 2012 8:49 am

Well, to be honest - I have no ideas as well.

I think the best way for now is to wait for VMware response.
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
starczek
Posts: 13
Joined: Mon Mar 05, 2012 1:30 pm

Fri Feb 22, 2013 4:49 pm

Sorry all folks for late update but I forgot about it :) Problem has been (almost) solved with VMware support.

Firstly I've been told that tests performed via SSH console on ESXi hosts using DD command are far from the proper way of testing storage performance. The one and only valid method is to test it from inside of the VM.
Secondly, if one wants to achieve reliable results then one must use Thick provisioned eager zeroed virtual disk provision type. Any other provision type have significantly and negative impact on virtual disk performance.

Having regard to the above two remarks performance finally is as expected :D For Windows VMs read and write is a bit below 200 MB/s (sometimes exceeds that value). The same for Linux machines (CentOS 6.3) with one exception: read performance still caps at ~108-110 MB/s. Strangest thing is that write throughput reaches ~220 MB/s. In both cases RR policy works perfectly and load is evenly split between both interfaces but for read each is used only in ~50%.

The last issue is still investigated by VMware.

I'll try to update you as soon as there will be something new worth to publish here.

PS During comprehensive test any impact of any kind of caches (storage and VM guest OS) have been excluded.
User avatar
anton (staff)
Site Admin
Posts: 4021
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Fri Feb 22, 2013 9:04 pm

Thank you very much for keeping us updated! Indeed a very valuable feedback.

P.S. We've also hit a "no thin-provisioned VMs" thing during our recent LSFS tests. Lightning strikes twice :)
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
Post Reply