Slow iSCSI Performance ESXi 5.1

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Post Reply
jeffhoward001
Posts: 4
Joined: Fri Mar 29, 2013 12:21 am

Fri Mar 29, 2013 1:25 am

Hello -

I'm experiencing some pretty poor performance when mounting VMFS5 volumes in ESXi 5.1 from a Windows 2008 R2 Starwind iSCSI SAN (running on physical hardware). I'm starting out with the most simple test - Copying an ISO image from one datastore to another. For example, the source data store is a RAID-5 (capable of 300+MB/sec) and the destination on the Starwind SAN side is also a RAID-5 capable of 200MB/sec writes (I see it every day with our Windows-based backup software).

I installed Starwind, and configured the standard iSCSI software initiator in ESXi. All went well on the configuration side. I deployed a few VMs to the iSCSI volume, and immediately noticed a pretty hard speed limit of between 10-20MB/sec (100-200Mbps on the networking side).

This host also serves as our backup device on the Windows/SQL side, so I know the NICs and RAID volumes can easily handle full bandwidth from a 1Gbe port. In fact, the disk volumes are configured to handle full saturation on all four 1Gbe ports up to 400MB/sec across 3 separate RAID-5's of 15 spindles a piece (we do this nearly every night during our backup process).

I've read two or three other posts on this topic, and sadly the solutions where that the people got frustrated and rebuild their environment from scratch and the issue went away... Unfortunately I don't have the luxury to do that in my case as this is a Production backup server.

So after spending a TON of time troubleshooting and reading articles on slow ESXi iSCSI performance, I submitted a support ticket with VMWare (we're fully supported on their Production SLA). The said they couldn't help me because the StarWind iSCSI software was only listed up to ESXi 4.1 on the HCL.

So two questions before I spend any more time on this:

1) Does StarWind plan on getting at least the pay-for version of this on the HCL list for ESXi 5.1? I can't purchase software for use with VMWare w/out it being on the HCL.

2) Are there any other advanced troubleshooting guides for connecting VMWare volumes to the StarWind SAN? I've seen the ESXi configuration guide with the MPIO settings, but that isn't really relevant when I can't saturate even a single 1Gbe NIC more than 20%. There has to be a logical reason why it's limited to 20MB/sec. The only other similarity with the other posts was that my RAID controllers are LSI, but so are probably 75% of the controllers on the market since both Dell and HP use LSI-based controllers now.

Any help would be greatly appreciated.

Thanks,

- Jeff
jeffhoward001
Posts: 4
Joined: Fri Mar 29, 2013 12:21 am

Fri Mar 29, 2013 7:09 pm

Is it possible for someone to confirm if 10-20MB/sec writes are expected for your product when connecting via the ESXi software initiator? I did some additional testing this morning, and reads seem to be faster (50-60MB/s per NIC) but still not really close to wire speed. At this point I can handle 50MB/sec on reads, but 10MB/sec writes are a deal-breaker.
jeffhoward001
Posts: 4
Joined: Fri Mar 29, 2013 12:21 am

Fri Mar 29, 2013 7:18 pm

One other observation I noticed from other posts on this topic. In most of these scenarios, the Starwind engineer seems really concerned about the underlying storage system (e.g. RAID card, type, block-size, etc). While that's good information to have, I'm curious why that's relevant since most people (including myself) provide evidence in the initial post that the underlying storage sub-system is working fine. Meaning, if I had a problem with block-level alignment between Windows and the stripe sizing on my RAID array, that would have already manifested itself in poor performance for all I/O at the Windows OS level on the Starwind server, right?

Possibly I'm over-simplifying the problem, but a brief explanation on how that all ties together would be nice as well. Again, I'm speaking from the perspective of having already tested the I/O sub-system extensively, and we know our RAID volumes are capable of pushing 200MB/sec writes per RAID set (and we have three independent RAID sets with their own controllers and spindles).

Thanks,

- Jeff
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Mon Apr 01, 2013 9:55 am

1) Does StarWind plan on getting at least the pay-for version of this on the HCL list for ESXi 5.1? I can't purchase software for use with VMWare w/out it being on the HCL.
Yes, we are working to get the VMware certification already. But to be honest VMware has pretty good support and they are trying to get the problem fixed no matter if everything is in HCL, and from my experience when they are saying "this product is not in HCL so you`ll not get support" basically mean (we don't know how to solve your problem". Well, just my opinion, forgive me if I`m wrong.
2) Are there any other advanced troubleshooting guides for connecting VMWare volumes to the StarWind SAN? I've seen the ESXi configuration guide with the MPIO settings, but that isn't really relevant when I can't saturate even a single 1Gbe NIC more than 20%. There has to be a logical reason why it's limited to 20MB/sec. The only other similarity with the other posts was that my RAID controllers are LSI, but so are probably 75% of the controllers on the market since both Dell and HP use LSI-based controllers now.
Before talking about hte "other advanced troubleshooting guides " I need to know what exactly have you read. Basically it would be really great if you could make us more familiar with your setup. 1. Operating system on servers participating in iSCSI SAN and client server OS
2. RAID array model, RAID level and stripe size used, caching mode of the array used to store HA images, Windows volume allocation unit size
3. NIC models, driver versions (driver manufacturer/release date) and NIC advanced settings (Jumbo Frames, iSCSI offload etc.)
4. Network scheme.
Also, there is a document for pre-production SAN benchmarking:
http://www.starwindsoftware.com/starwin ... ice-manual
And a list of advanced settings which should be implemented in order to gain higher performance in iSCSI environments:
http://www.starwindsoftware.com/forums/ ... t2293.html
http://www.starwindsoftware.com/forums/ ... t2296.html
Is it possible for someone to confirm if 10-20MB/sec writes are expected for your product when connecting via the ESXi software initiator?
Its expected when there is the bottleneck or missconfigured. In healthy environment our product shows really great results. The performance of our solution is limited by the hardware of the server where it is running – mostly it is limited by the wire speed. As you may know Intel and Microsoft achieved 1000000 result in their result using StarWind as the SAN solution (to learn more about it you can use this link: http://www.starwindsoftware.com/news/32). So everything is basically needed (and this is not the easiest part), is properly configured system.
One other observation I noticed from other posts on this topic. In most of these scenarios, the Starwind engineer seems really concerned about the underlying storage system (e.g. RAID card, type, block-size, etc). While that's good information to have, I'm curious why that's relevant since most people (including myself) provide evidence in the initial post that the underlying storage sub-system is working fine. Meaning, if I had a problem with block-level alignment between Windows and the stripe sizing on my RAID array, that would have already manifested itself in poor performance for all I/O at the Windows OS level on the Starwind server, right?
As I mentioned previously - StarWind runs with the speed allowed by the system, so in most of cases our task is to eliminate where the problem is by doublechecking everything.
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
jeffhoward001
Posts: 4
Joined: Fri Mar 29, 2013 12:21 am

Wed Apr 10, 2013 11:50 pm

Sorry for going dark on this. I had a 10 hr slot the last week in March to work on this project, and have since had to move onto more pressing matters. I'll try to pull the details requested later this week and update the thread. Do you have any tentative dates on when the VMWare certification will be complete?

Our policy is very rigid on IT purchasing in terms of "supported" products. We pay a large sum for the Production SLAs with VMWare, so deploying on a platform that they won't support basically voids a critical support path of us.
imrevo
Posts: 26
Joined: Tue Jan 12, 2010 9:20 am
Location: Germany

Thu Apr 11, 2013 8:46 am

Hi Jeff,
jeffhoward001 wrote:Is it possible for someone to confirm if 10-20MB/sec writes are expected for your product when connecting via the ESXi software initiator?
I'm getting wire speed when benchmarking from a 4 NIC ESX host to a 2x2 NIC StarWind HA Cluster: 4GBit. But it took me a while to get there.

What does your network setup look like? Do you have round robin enabled? How many paths are shown to each LUN / target in ESX? Have you enabled Jumbo frames?

bye
Volker
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Mon Apr 15, 2013 8:56 am

Dear jeffhoward001, do you have any update for us please?
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
Post Reply