Page 1 of 1

Rebuild times in 3-node set-up

Posted: Thu Jul 31, 2014 4:54 pm
by robnicholson
We currently have 5 x Hyper-V hosts in our farm so the idea of getting rid of some tin (the dedicated SAN server) and running StarWind on the Hyper-V hosts is attractive. There appears to be two options:

2-node HA with RAID-10 on the attached JBOD and a single 10GbE sync link between them
3-node HA with RAID-0 on the attached JBOD and a dual 10GbE sync link between them

The 3-node option is attractive in that you only need 3 x N disks whereas the 2-node option needs 4 x N disks. Considering that 4TB NL SAS is ~£450 in the UK, for a 64TB system that's £28.8k reduced to £21.6k. You also need an additional disk controller (£750), disk enclosure (£1k), dual-GbE NIC (£250) & StarWind license (???) so that saving isn't that big - maybe £4k.

I'm concerned though about rebuild times in the 3-node RAID-0 set-up. In the 2-node RAID-10 set-up when a disk fails, the RAID controller only has to rebuild 1 x 4TB which won't take that long to re-mirror. During this time, both StarWind nodes are functional.

However, in the 3-node set-up, if a disk fails in the RAID-0 array, it's game over isn't it? When you replace the disk, the volume is unrecoverable is it not? A whacking big 4TB hole missing in the middle. I assume you'd have to get StarWind to re-synchronise the entire 64TB? That's even if it's possible for the broken array to even carry on. Would you not have to remove the node from StarWind sync, remove the target from StarWind, re-create the RAID-0 volume, re-add to StarWind and re-sync the whole lot. That sounds like a lot of trouble compared to the RAID controller hot-swapping in a new 4TB drive and getting fully back online in hours, not days?

If it works differently, please let me know as for this reason, 3-node is not attractive.

Re: Rebuild times in 3-node set-up

Posted: Thu Jul 31, 2014 4:58 pm
by robnicholson
In the 2-node RAID-10 set-up when a disk fails, the RAID controller only has to rebuild 1 x 4TB which won't take that long to re-mirror. During this time, both StarWind nodes are functional.
Plus in this scenario, the rebuild is at hardware level on the disk controller. A rebuild of a 3-node system is across the 10GbE sync network which is way slower than RAID re-mirror of one disk esp. if the entire volume has to re-sync via StarWind.

Re: Rebuild times in 3-node set-up

Posted: Thu Jul 31, 2014 5:11 pm
by robnicholson
Actually WD4001FYYG NL 4T SAS drives are now only £200 in the UK so 4 x 16 x 4TB (for 2-node RAID-10) is £12.8k compared to 3 x 16 x 4TB (for 3-node RAID-0) is £9.6k. I don't think that price difference justifies the potential problems highlighted above to consider 3-node?

Re: Rebuild times in 3-node set-up

Posted: Thu Jul 31, 2014 5:13 pm
by robnicholson
Plus I assume that as the number of synchronous nodes goes up, performance goes the other way as it's got the mirror across the 10GbE to two disks, not just one?

Re: Rebuild times in 3-node set-up

Posted: Sun Aug 03, 2014 7:54 pm
by anton (staff)
We typically have just an opposite picture: RAID rebuilds are slower then just copy from partner node that's why we recommend 3-way replica with a RAID0 disk setup. You just provision new data store from scratch and get data from live ones.
robnicholson wrote:
In the 2-node RAID-10 set-up when a disk fails, the RAID controller only has to rebuild 1 x 4TB which won't take that long to re-mirror. During this time, both StarWind nodes are functional.
Plus in this scenario, the rebuild is at hardware level on the disk controller. A rebuild of a 3-node system is across the 10GbE sync network which is way slower than RAID re-mirror of one disk esp. if the entire volume has to re-sync via StarWind.

Re: Rebuild times in 3-node set-up

Posted: Sun Aug 03, 2014 7:57 pm
by anton (staff)
There are no problems with 3-way replication. Typical setup has pretty much the same level of resilience (assuming you still run VM-level backup) but much better local performance with I/O bond to local node (paired with LSFS to accelerate writes especially).
robnicholson wrote:Actually WD4001FYYG NL 4T SAS drives are now only £200 in the UK so 4 x 16 x 4TB (for 2-node RAID-10) is £12.8k compared to 3 x 16 x 4TB (for 3-node RAID-0) is £9.6k. I don't think that price difference justifies the potential problems highlighted above to consider 3-node?

Re: Rebuild times in 3-node set-up

Posted: Sun Aug 03, 2014 8:00 pm
by anton (staff)
It depends (c) ... Mostly on do you run Hyper-Converged or Compute and Storage separated scenarios. So writes will always suffer if you run switch-based config and are the same if you run all-mesh interconnect (every partner runs own physical network, quite common BTW). Reads are bond to local with Hyper-Converged but there's a small "boost" from increased amount of MPIO paths. If you run Compute and Storage separated then reads are much faster (+50%) again assuming you don't run shared NICs for all traffic.
robnicholson wrote:Plus I assume that as the number of synchronous nodes goes up, performance goes the other way as it's got the mirror across the 10GbE to two disks, not just one?

Re: Rebuild times in 3-node set-up

Posted: Mon Aug 04, 2014 3:08 pm
by robnicholson
We typically have just an opposite picture: RAID rebuilds are slower then just copy from partner node that's why we recommend 3-way replica with a RAID0 disk setup. You just provision new data store from scratch and get data from live ones.
Compared to RAID-10?

Consider a 32TB RAID-10 array built from 16 x 4TB NL SAS. Are you saying that StarWind can rebuild the entire 32TB faster across the sync LAN than the RAID controller can re-mirror just 4TB directly on the hardware?

We're just building a more serious lab environment using some spare Dell PowerEdge servers so we can do some real performance tests. That will allow me to check what happens on a RAID-0 array when you trash one drive.

Cheers, Rob.

Re: Rebuild times in 3-node set-up

Posted: Sat Aug 09, 2014 12:39 pm
by Anatoly (staff)
I`m not gonna interrupt your conversation about speed, and just gonna say that if you`re forced to do the RAID rebuild - the system will be slow, if you`re in StarWind Sync process - you can switch traffic priority from sync to client requests, and the process will stop being painfully slow,