Low Sync Bandwidth

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Post Reply
delocx
Posts: 4
Joined: Thu Jan 23, 2014 4:45 pm

Wed Jan 29, 2014 11:34 pm

We've noticed some peculiar behaviour when our HA images are doing a full sync.

First, our test setup is 1 RAID1 volume on each server. The servers are Windows 2012 R2 with 16 GB RAM. They have the following network adaptors:
Access Team - 4 1gbps NICs (4gbps) teamed using windows (these are not used to connect or synchronize via Starwind)
iSCSI 1 - 1gbps NIC - HB
iSCSI 2 - 1gbps NIC - HB
iSCSI 3 - 1gbps NIC - HB
Sync 1 - 1gbps NIC - HB & Sync
Sync 2 - 1gbps NIC - HB & Sync
Sync 3 - 1gbps NIC - HB & Sync
The three sync NICs are directly connected to each other using crossover cables. The iSCSI NICs are connected to a switch that is spec'd to handle 16gbps. The other NIC Team is connected to a separate switch and network.

We're running the drives on an LSI MegaRaid 9280-24i4e with the most recent drivers and firmware. We have the most recent V6 install for Starwind and our NIC drivers are the most recent directly from the Intel website.

Now for the problem. When we test a single volume for full sync, we expect to see approximately 50% of each adapter to be in use (not sure where I remember reading that, but it was on the website somewhere). Our testing instead shows the NICs running at approximately 60 - 80 mbps. Even across 3 channels, we're looking at only about 200 mbps of sync, which is nowhere near the full capacity of those links (3gbps) or even expected performance at 50% (1.5gbps). This also means syncing a 500 GB volume is taking pretty much an entire day, which is ridiculous and unacceptable.

On closer inspection, we found that the NICs we thought were operating at 1gbps in the access team, were actually only running at 100mbps. If take this out across 4 NICs it give 400 mbps. We then noticed that the total traffic across the sync NICs was roughly 200 mbps, or half the total capacity of that team. We took a guess that the Starwind service was taking that bandwidth from that adapter as its top speed and then setting limits based on that. With this hypothesis in mind, we corrected the issue preventing those NICs from operating at full speed and tested again.

Still only 60 - 80 mbps per sync NIC.

Next we decided to test what the actual throughput on the HDDs and NICs was. Using IOMeter, the disk performance looked completely normal (I didn't record the numbers at the time, but they were not unusual). Next we tested the bandwidth across each NIC, including the teamed NICs. They all worked at their expected speeds.

We've also found that accessing the iSCSI volumes and doing performance testing on them seems to give results consistent with the speeds directly to the disks with some overhead, so the targets themselves seem fine. What is odd is that if we do write testing, the speed of the writes in IOMeter seems to have no correlation with the bandwidth use on the sync channels. I suppose that could just be an oddity with how/what IOMeter is doing and how Starwind figures out what to synchronize.

At this point we are out of ideas for what else could be causing the problem. When every other operation on these NICs and HDDS performs normally except for Starwind, we're stuck with the conclusion that the software is either not configured properly, or doesn't work properly.

Does anyone know what next steps we can take?
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Thu Jan 30, 2014 4:30 pm

Thank you for your interest in StarWind.
First this first: NIC teaming and iSCSI are compatible, but teaming just killing performance, so it is much better idea to use MPIO RR instead.

Quick questions:
*How exactly have you verified that Network can show good results?
*Can you test SyncChannel by making a 100% heavy write load on HA?
*How exactly have you configured SyncChannel? Can you confirm that there are no any NIC teaming?
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
delocx
Posts: 4
Joined: Thu Jan 23, 2014 4:45 pm

Fri Jan 31, 2014 4:14 pm

Anatoly (staff) wrote:Thank you for your interest in StarWind.
First this first: NIC teaming and iSCSI are compatible, but teaming just killing performance, so it is much better idea to use MPIO RR instead.

Quick questions:
*How exactly have you verified that Network can show good results?
*Can you test SyncChannel by making a 100% heavy write load on HA?
*How exactly have you configured SyncChannel? Can you confirm that there are no any NIC teaming?
  • NIC Team is not, repeat not, configured on any NICs in use by Starwind. Only mentioned because there seemed to be some correlation between the speed we were seeing on the sync channel and the total speed on the team.
  • Network speed was verified using iPerf.
  • Putting a 100% write load on the drive using IOMeter and the sync channels top out between 80 and 120 mbps.
The sync channel is done across three NICs, each on an independent network with directly connected NICs. All targets are connected across 3 other NICs using MPIO RR.

We have been trying to get this to work in production for almost 12 months now and it simply refuses to work properly, whether is is poor speed on the targets or extremely slow syncing. We've already had to concede that full syncs are going to happen far more often then we would like (one a week at minimum), so all our targets are as small as we can make them. It seems like if you sneeze close enough to the servers, they decide to run a full sync, and they certainly don't survive any unexpected outage, which is exactly what we're trying to protect against.

We've also had to build independent, dedicated storage using JBOD servers for our more important workloads (Exchange, SQL, some important Hyper-V VHDs), because the reliability and performance with Starwind just isn't there. This more than anything shows our level of frustration; we find it safer to have a have a single point of failure with these systems then try to host them on a Starwind HA drive.
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Fri Jan 31, 2014 7:25 pm

12 Months? Wow! Sorry to hear that.
OK, so let`s not put it on the long box and schedule the remote session to get it solved. Could you please drop the email with the reference to this forum to support@starwindsoftware.com and me or my colleague will arrange the time for RS with you?

Thank you
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
delocx
Posts: 4
Joined: Thu Jan 23, 2014 4:45 pm

Fri Jan 31, 2014 9:39 pm

Anatoly (staff) wrote:12 Months? Wow! Sorry to hear that.
OK, so let`s not put it on the long box and schedule the remote session to get it solved. Could you please drop the email with the reference to this forum to support@starwindsoftware.com and me or my colleague will arrange the time for RS with you?

Thank you
Hey Anatoly,

We're actually simultaneously in communication with a gentleman via email (Joseph I believe is his name). Figured it was worth pursuing all possible paths for a solution, including posting here! We'll get something arranged via him then.

Thanks.
User avatar
anton (staff)
Site Admin
Posts: 4021
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Sat Feb 01, 2014 8:01 pm

From what I know Joe is out for next week (planned vacation) so you'll probably proceed with Max. I'll supervise your case. Sorry once again for what happened we'll do our best to sort the issues out ASAP. Thank you!
delocx wrote:
Anatoly (staff) wrote:12 Months? Wow! Sorry to hear that.
OK, so let`s not put it on the long box and schedule the remote session to get it solved. Could you please drop the email with the reference to this forum to support@starwindsoftware.com and me or my colleague will arrange the time for RS with you?

Thank you
Hey Anatoly,

We're actually simultaneously in communication with a gentleman via email (Joseph I believe is his name). Figured it was worth pursuing all possible paths for a solution, including posting here! We'll get something arranged via him then.

Thanks.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
Post Reply