HA Sync Performance

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

ThTheurer
Posts: 7
Joined: Thu Jan 21, 2010 12:06 pm

Mon Aug 30, 2010 1:32 pm

Hi there,

I run Starwind 5.4.0 (1599) on 2 Storage Servers.
I've setup a dedicated 1Gb connection for the synchronization of the target between both servers. When I do a full sync, the line is used only by ~ 30%. (JumboFrames are enabled.)

Any ideas how to speed up the sync-time ?

best regards
Thomas
User avatar
Max (staff)
Staff
Posts: 533
Joined: Tue Apr 20, 2010 9:03 am

Mon Aug 30, 2010 2:04 pm

Hello,
This situation can be caused by two reasons:
1. Low throughput between the nodes (even it the nic's are 1GbE). Please check the network throughput using ntttcp tool
2. Low disks performance (disks go overloaded when facing crossreads). You can use IOMeter for testing disks performance in the emulated environment.
Max Kolomyeytsev
StarWind Software
ThTheurer
Posts: 7
Joined: Thu Jan 21, 2010 12:06 pm

Mon Aug 30, 2010 2:46 pm

Thx Max,

any other idea ?

ntttcp thoughput > 700Mb/sec
diskio don't know, but when I just copy a file to or from that server the line is used ~ 99 %
(it's a fast Raid 10 Array with 14 disks and no other process is using the disks right now).

best regards
Thomas
ThTheurer
Posts: 7
Joined: Thu Jan 21, 2010 12:06 pm

Mon Aug 30, 2010 3:19 pm

Its me again,

additional info. We checked that also in another installation, same effect. Sync run with ~ 30% of the line capacity.

cheers
Thomas
Constantin (staff)

Mon Aug 30, 2010 3:25 pm

700Mb/s it`s performance of sync channel?
Have you seen our topic with advanced tweaks for achiving maximum performance here http://www.starwindsoftware.com/forums/ ... -t792.html?
ThTheurer
Posts: 7
Joined: Thu Jan 21, 2010 12:06 pm

Mon Aug 30, 2010 4:01 pm

Hi Constantin,

I've seen it a applied it on both servers.

I have 4 seperate connections for syncing and tested the ntttcp on one of the (at the moment) unsed links.

cheers
Thomas
Constantin (staff)

Mon Aug 30, 2010 5:05 pm

Do you use caching or something else?
ChrisB
Posts: 4
Joined: Tue Aug 31, 2010 1:00 am

Tue Aug 31, 2010 1:19 am

This is also something we are experiencing. Almost exactly the same speed: ~30% of a 1Gb link (35-40 MB/s). This kills performance for writes to the SAN.

I'm also able to completely peg the line with iperf and a SMB file transfer. It's only an issue in an HA setup - in single node configuration or with one HA node down, writes are exactly as they should be.

I can't see the actual disks being an issue either, since with 4+ GB of write-back cache (in our case) any sync data that the disks can't handle will go into the controller's BBWBC first, and if thats full, into the Starwind WB cache. However, what I noticed when testing is that no matter the caching (WB, WT, or none) selected, we still hit that bottleneck of 40MB/s for writes and re-syncs.

We do have a ticket opened about this, but haven't made any progress or identified the issue yet. I'm concerned that Starwind's HA might not meet our performance needs, which is unfortunate because it performs so well in non-HA mode. However the reason we went with Starwind over Openfiler in the first place is that we get the benefit of real active-active nodes.

I've currently got our SANs out of production and would test anything to try and get the sync speed to where it should be. Unfortunately I'm just not sure where we are hitting a bottleneck.

Just for curiosity and reference sake, what kind of hardware are you running ThTheurer? Ours is setup with 2 x HP DL320s boxes, 12 x 15k SAS drives each, utilizing two ports of an HP Quad-port NIC for Sync (not sure on exact model). The OS is a fresh copy of Server 2008 R2.
ThTheurer
Posts: 7
Joined: Thu Jan 21, 2010 12:06 pm

Tue Aug 31, 2010 6:42 am

Hi Chris,

our servers (w2k8 R2) are equipped with a quad-core Xeon 3430 and 8Gb Memory, 2 on-boad nics, 2 Intel Quad Port Server Adapters, a raid-1 of 2 1Tb disks for the c-drive and a raid-10 of 14 1Tb disks for the d-drive.

The sync is running on one nic of one of the quad port adapters (each target (will) have its own 1Gb sync-channel).
A colleague of mine has the same problem, but he don't use a quad-port adapter.

The hyper-v nodes are communicating over the other quad-port adapter, the onboad nics are used for the "public" network.

cheers
Thomas
Constantin (staff)

Tue Aug 31, 2010 10:52 am

We have faced with situations where the reason of such low performance were HP NICs and drivers of NICs, please check if you use latest versions of drivers. Either we can make a screen sharing session to test it.
ThTheurer
Posts: 7
Joined: Thu Jan 21, 2010 12:06 pm

Tue Aug 31, 2010 11:55 am

Hi Constantin,

I would be happy to show you the problem, just let me know how.

cheers
Thomas
ChrisB
Posts: 4
Joined: Tue Aug 31, 2010 1:00 am

Tue Aug 31, 2010 6:23 pm

In checking that I had the newest drivers for our NICs, I noticed that although branded HP, the actual chipset on the NIC is made by Intel; specifically Intel E1E. I did confirm that we are running the latest drivers for that NIC, as I suspected.

Later this week I have some other NICs coming in that I plan on testing in the machines to see if the performance changes.
sls
Posts: 17
Joined: Fri Jun 18, 2010 6:16 pm

Wed Sep 01, 2010 3:33 am

We have the exact same problem in our environment. Our servers are HP DL180 G6 with quad core, hyper threading/Turbo processor. The OS is Windows 2008 R2. When we setup the first HA target with 2TB image file, the full sync took 4 days to complete. We made the registry changes base on the performance tweaking in the other post even we don't think it's necessary on Windows 2008 R2 platform. It doesn't make that much difference. The NIC won't utilize over 45% no matter what we do. I looked around all other post in the forum. It seems like the support always suspect the issue is either the driver or the NIC. We updated the driver to the latest by HP, no luck. Just in case there is something bad in the onboard HP NIC, we added in a brand new Intel quad port NIC with iSCSI offload option (Pro/1000 ET) in both nodes. It still doesn't break the 45% bottleneck. We are sure the issue is not the HP onboard NIC since it does the same thing with add-in Intel NIC. It's not the driver either, since we installed the latest driver by the manufacture. To further prove it isn't the NIC problem. We stop the Starwind service and copy the 2TB image file that starwind uses for HA between nodes across the windows SMB share. The file copying took less than 1 hour to complete and the NIC was 100% utilized during the file copy. If the bottleneck is the NIC drive, NIC chipset or storage controller, no matter what application you use, you will see the same bottleneck issue. In our case, Starwind software is the only application runs into the problem. Something in the application level is preventing the data to fully utilize the NIC bandwidth. That is what we can tell so far.

We also notice one thing that makes us believe the problem is the software itself. We ran WireShark to capture the packet when the full sync is running. We see a lot of header mismatch error in the iSCSI data transmission. That pretty much is the indication of the error by the software itself since Starwind is the only software generates the iSCSI data in the system.

Would everybody run the Wireshark on your environment to see if you guys can see the same result as we do? Just to make sure it wasn't anything odd in our setup.

We opened a support case long time ago but still no any resolution so far.
The header error in the iSCSI packet and the performance concerns us a lot. Header error generally means the data was corrupted and needs retransmitted. We don't want to put it in production until these two issues have been resolved.
:!:
Constantin (staff)

Wed Sep 01, 2010 9:35 am

Ok! Cool! I was afraid of it... So, let`s try to do it following way:
1. Change cables between synchronization NICs
2. Don`t use switch there, only direct connection
3. If broken headers would continue to appear - try to use any other 3d party NIC - Intel, Broadcom, Realtek - whatever.
4. If you steel have broken headers - disable ToE, iSCSI offload, etc.
5. If you steel see problems - call to HP or Dell, problems are on their side ;)
We have faced with such problem some times, and we found that on sime kind of HP`s hardware and drivers for NICs headers are really often broken, and StarWind receive broken packets, and not breaking it :)
User avatar
matty-ct
Posts: 13
Joined: Tue Oct 06, 2009 6:09 pm

Thu Sep 02, 2010 5:37 pm

Interesting idea there. Use a crossover cable to eliminate the switch? However, HP and Dell comprise the overwhelming majority of the world's servers. Claiming that HP, Dell, Broadcom and Intel all have it wrong is hard to swallow. I can't imagine any enterprise IT department having much confidence in the suggestion to use RealTek adapters in lieu of server class offerings from the big server vendors.

I don't claim to have the answer but if I sold any of my enterprise clients an expensive iSCSI solution and I then told them that we needed to add non-Intel, non-Broadcom, non-Dell, or non-HP NIC's to their servers, they'd seriously question my judgment and expertise. This answer gives me great pause. I find the Starwind solution an excellent one but questions regarding HA performance are pertinent. Good luck, guys!
Post Reply