Load balancing, sort of...

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Post Reply
robnicholson
Posts: 359
Joined: Thu Apr 14, 2011 3:12 pm

Thu Jun 19, 2014 8:40 am

If you have a clustered SAN set-up like this:

Code: Select all

SAN1: 500MB write-back cache -> 2TB volume
                  ^
                  |
                  V				  
     Fast NIC mirror/sync channel
                  ^
                  |
                  V				  
SAN2: 500MB write-back cache -> 2TB volume
You configure MPIO on the iSCSI initiator to connect to SAN1 and SAN2 using multi-path. MPIO defaults to round robin so writes one block to path #1 (SAN1) and then to path #2 (SAN2), back to path #1 etc.

So, in effect has one double the size of the write-back cache thus improved performance? For example, write a 1GB file to the above, and it fills both caches rapidly and then comes back "okay". At it's leisure, the two caches are flushed to disk and mirrored at the same time (i.e. write-back from SAN1 > disk doesn't say "done" until both mirrors are updated).

So with mirroring, you've gained on one hand by having bigger caches but loose at the same time by the flush to disk now having to write to two disk systems, albeit in parallel.

Is my reasoning sound here?

Cheers, Rob.,
User avatar
anton (staff)
Site Admin
Posts: 4021
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Thu Jun 19, 2014 9:44 pm

Not going to work. For a very simple reason: life is not only writes but also reads. If you have cache content different on your SAN nodes then read going to non-ACK-ed node would return old data. That's why caches are effectively wasted and you have 1/2 or 1/3 (2-way or 3-way replica accordingly) size from a "sigma" of all the cache sizes. Say you have 3 node basic setup with a 3-way replica configured for LUs/CSVs (one shared LU/CSV per physical host). Each node has 1GB of L1 cache (zero L2 cache for simplicity). Total cache size is 1+1+1 = 3GB but because of replication you still have 1 GB usable. Of course because of the replication there are now 3 paths to it so access is faster then it would be with a single node. Think about every new replica adding one virtual "port". In our sample we'll have 1 GB of a three-ported cache memory.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
robnicholson
Posts: 359
Joined: Thu Apr 14, 2011 3:12 pm

Fri Jun 20, 2014 8:54 am

If you have cache content different on your SAN nodes then read going to non-ACK-ed node would return old data.
Ohh this is important - are you saying that caches are useless with mirrored systems? Or do you keep the caches in sync across the mirror?

So if block is written to SAN #1 write-back RAM cache, it also has to be written synchronously to SAN #2 write-back RAM cache? Same down to the next tier of SDD cache and then same again down to physical disks?

Cheers, Rob.
User avatar
anton (staff)
Site Admin
Posts: 4021
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Fri Jun 20, 2014 8:03 pm

Caches are not useless just an opposite. What I'm saying - you cannot keep caches out-of-sync and cannot just add usable capacity from many nodes (something you did with your sample).

Yes, that's the way "write back" cache works.
robnicholson wrote:
If you have cache content different on your SAN nodes then read going to non-ACK-ed node would return old data.
Ohh this is important - are you saying that caches are useless with mirrored systems? Or do you keep the caches in sync across the mirror?

So if block is written to SAN #1 write-back RAM cache, it also has to be written synchronously to SAN #2 write-back RAM cache? Same down to the next tier of SDD cache and then same again down to physical disks?

Cheers, Rob.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
Post Reply