HA Sync Performance

heitor_augusto · Tue Oct 26, 2010 4:40 pm

anton (staff) wrote:Grab new Beta... We've probably doubled both reads and writes (and IOPs should jump over the head) in it. Checking some real life 10 GbE configs is what we'd be very interested in Thanks!

How can I get the new beta? I signed up for beta-testing program but have not received anything yet.

Tue Oct 26, 2010 5:30 pm

You'll get an e-mail with private download link.

heitor_augusto wrote:
anton (staff) wrote:Grab new Beta... We've probably doubled both reads and writes (and IOPs should jump over the head) in it. Checking some real life 10 GbE configs is what we'd be very interested in Thanks!
How can I get the new beta? I signed up for beta-testing program but have not received anything yet.

heitor_augusto · Fri Oct 29, 2010 2:17 am

anton (staff) wrote:Grab new Beta... We've probably doubled both reads and writes (and IOPs should jump over the head) in it. Checking some real life 10 GbE configs is what we'd be very interested in Thanks!

I did tests with 5.5 build 20100831 and HA device connected to VMWare ESX 4.1, and the same problem persists. Write performance on VM with 1 Gb sync link is just about 50 Mbps and bandwidth usage on this link doesn't exceed 45%.

Fri Oct 29, 2010 8:54 am

50 megabits or 50 megabytes per second?

heitor_augusto wrote:
anton (staff) wrote:Grab new Beta... We've probably doubled both reads and writes (and IOPs should jump over the head) in it. Checking some real life 10 GbE configs is what we'd be very interested in Thanks!
I did tests with 5.5 build 20100831 and HA device connected to VMWare ESX 4.1, and the same problem persists. Write performance on VM with 1 Gb sync link is just about 50 Mbps and bandwidth usage on this link doesn't exceed 45%.

heitor_augusto · Fri Oct 29, 2010 5:36 pm

[quote="anton (staff)"]50 megabits or 50 megabytes per second?

Sorry, 50 MB/s.

The best result was precisely 47 MB/s and the worst 36 MB/s.

The following command was executed on virtual machine:

# dd if=/dev/zero of=test bs=1M oflag=direct bs=1M

Bonnie++ shows the same results, and in all tests the sync link doesn't exceed 45%.

The policy used for iSCSI multipath on ESX 4.1 is "Fixed".

Sun Oct 31, 2010 3:14 pm

So if I do understand you correctly your sync link cannot push more then 50 MB/sec with any tool you've used? What's maximum possible thru-outout with NTtcp & IPerf?

P.S. I see about 45% network utilization, it's known issue, sync traffic is still "pulsating", we're working on pipeline burst model for now.

heitor_augusto · Tue Jan 25, 2011 3:57 pm

I did some more tests and in all results the performance is limited by the synchronization link.

With multipathing policy "Fixed" I didn't get no more than 50% usage in the sync link, which limits the performance about 47 MB/s.

I tried the round robin multipathing policy, taking advantage of the two sync links installed between the storages, and the same problem occurred.

To simulate: connect a ESXi host in two arrays in HA with Fixed multipathing policy and check the I/O on a virtual machine and the use of synchronization network (1 Gbit)

The network is definitely not a problem. Tests using NTtcp showed a performance close to 1 Gbit.

The tests were performed using the new StarWind 5.5 with a HA LUN of 2 GB and write-back cache set to 4 GB.

Tue Jan 25, 2011 4:00 pm

You're calculating in the wrong way. 50% usage for full-duplex link is ~125 MB/sec. Full-duplex 100% usage is ~250 MB/sec.

heitor_augusto wrote:I did some more tests and in all results the performance is limited by the synchronization link.

With multipathing policy "Fixed" I didn't get no more than 50% usage in the sync link, which limits the performance about 47 MB/s.

I tried the round robin multipathing policy, taking advantage of the two sync links installed between the storages, and the same problem occurred.

To simulate: connect a ESXi host in two arrays in HA with Fixed multipathing policy and check the I/O on a virtual machine and the use of synchronization network (1 Gbit)

The network is definitely not a problem. Tests using NTtcp showed a performance close to 1 Gbit.

The tests were performed using the new StarWind 5.5 with a HA LUN of 2 GB and write-back cache set to 4 GB.

heitor_augusto · Tue Jan 25, 2011 6:02 pm

Ok, let me simplify the things:

* ESXi 4.1 host connected a non-HA LUN (size: 2 GB, write-back cache: 4 GB) give me a performance of 117 MB/s in the VM.
* ESXi 4.1 host connected a HA LUN (size: 2 GB, write-back cache: 4 GB) give me a performance of 47 MB/s in the VM. The graph of the synchronization network in Resource Monitor of Windows 2008 does not exceed 50%.
* The test with the NTTtcp shows a performance of 939.078 Mbits in a synchronization network between storages.

Wed Jan 26, 2011 11:51 am

1) Good. Wire speed. Congratulations to StarWind team.

2) In HA all writes are executed TWICE. First write is issued by hypervisor to HA node 1 and then HA node 1 submits the same write to HA node 2 and only after it's completion whole storage cluster reports status "OK" to hypervisor. So for the same amount of time we need to move doubled data.
This cannot happen PHYSICALLY (for single non-overlapped operation of course, see below). This means HA performance is 50% for non-HA performance. Add some time for sync, delays and re-buffering and we'll get 40-45% (exactly what you have).

Windows Monitor shows virtual 2 Gbps network usage adding two full-duplex pathes (1 Gbps for Tx and 1 Gbps for Rx). For read and for write (both directions). As with write most of the data goes only one direction you cannot see more then 50% of the network utilization.

Both numbers you see are MAXIMUM ones. Or close to them.

3) Good. You have properly configured cross-link doing wire speed. Congratulations to you as an IT engineer.

Couple of remarks however.

1) In a RL I/O operation is quite seldom executed alone. Most of time bunch of writes go at the same time. So performance is going to be higher then 50% as we have a pipeline loaded. So question raises here: what are you using to measure your write performance? If the tool issues non-overlapped I/O you'll have poor results. With something like ATTO Disk Benchmark or Intel I/O Meter you'll be OK (default pattern is 4 concurrent I/Os). As I've told modern OSes don't use write-wait-write-wait scenario, they do write-write-write-wait-wait-wait. So please give a try to referenced tools or do execute some real scenario.

2) In a RL reads and writes are combined. So network usage is going to be higher for mixed read/write scenario (say you have file server or SQL server configured inside VM). Please give a try to combined load as well.

3) You can dramatically boost even single non-overlapped I/O mode by using doubled-tripled 1 GbE links between HA nodes or installing 10 GbE connection between them. You'll get something like 70-75% of non-HA config for purely sequential writes in non-overlapped mode. Give it a try as well.

heitor_augusto wrote:Ok, let me simplify the things:

* ESXi 4.1 host connected a non-HA LUN (size: 2 GB, write-back cache: 4 GB) give me a performance of 117 MB/s in the VM.
* ESXi 4.1 host connected a HA LUN (size: 2 GB, write-back cache: 4 GB) give me a performance of 47 MB/s in the VM. The graph of the synchronization network in Resource Monitor of Windows 2008 does not exceed 50%.
* The test with the NTTtcp shows a performance of 939.078 Mbits in a synchronization network between storages.

heitor_augusto · Wed Jan 26, 2011 6:17 pm

anton (staff) wrote: 2) In HA all writes are executed TWICE. First write is issued by hypervisor to HA node 1 and then HA node 1 submits the same write to HA node 2 and only after it's completion whole storage cluster reports status "OK" to hypervisor. So for the same amount of time we need to move doubled data.
This cannot happen PHYSICALLY (for single non-overlapped operation of course, see below). This means HA performance is 50% for non-HA performance. Add some time for sync, delays and re-buffering and we'll get 40-45% (exactly what you have).

Windows Monitor shows virtual 2 Gbps network usage adding two full-duplex pathes (1 Gbps for Tx and 1 Gbps for Rx). For read and for write (both directions). As with write most of the data goes only one direction you cannot see more then 50% of the network utilization.

Both numbers you see are MAXIMUM ones. Or close to them.

Thank you for the explanation, is very clear now.

anton (staff) wrote: 1) In a RL I/O operation is quite seldom executed alone. Most of time bunch of writes go at the same time. So performance is going to be higher then 50% as we have a pipeline loaded. So question raises here: what are you using to measure your write performance? If the tool issues non-overlapped I/O you'll have poor results. With something like ATTO Disk Benchmark or Intel I/O Meter you'll be OK (default pattern is 4 concurrent I/Os). As I've told modern OSes don't use write-wait-write-wait scenario, they do write-write-write-wait-wait-wait. So please give a try to referenced tools or do execute some real scenario.

Actually the tests were done with sequential non-overlapped I/O. I will do tests with ATTO and IOMeter and post the results later.

Wed Jan 26, 2011 9:42 pm

Pretty much expected this... Please increase your bandwidth between HA nodes adding extra NICs. Going to add performance and redundancy at the same time. Thanks!

camealy · Wed Jan 26, 2011 10:52 pm

anton (staff) wrote:Pretty much expected this... Please increase your bandwidth between HA nodes adding extra NICs. Going to add performance and redundancy at the same time. Thanks!

I wish it was as easy as just adding NICs. With static link aggregation or LACP we have many sites with 4x1GB links bonded and you only get 1 GB per connection. It may help you scale with multiple targets but it isn't going to do much for single target or full sync speed.

Thu Jan 27, 2011 12:04 am

Have to agree... Performance scales, linearly, but not using NIC multiplication coefficient (unfortunately). So 10 GbE between HA nodes should do the trick.

camealy wrote:
anton (staff) wrote:Pretty much expected this... Please increase your bandwidth between HA nodes adding extra NICs. Going to add performance and redundancy at the same time. Thanks!
I wish it was as easy as just adding NICs. With static link aggregation or LACP we have many sites with 4x1GB links bonded and you only get 1 GB per connection. It may help you scale with multiple targets but it isn't going to do much for single target or full sync speed.