HA Full Sync Requirement if both nodes are powered off

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Post Reply
peter.auchincloss
Posts: 16
Joined: Thu Jun 17, 2010 3:10 pm

Thu Jun 17, 2010 6:36 pm

If you are doing controlled shutdown of HA resources is there any process that permits both nodes to be powered down and will not reqire full sync after power is restored?
User avatar
anton (staff)
Site Admin
Posts: 4021
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Thu Jun 17, 2010 7:23 pm

Absolutely no! Whole idea of having HA is powering down nodes one-by-one. What you really need is

1) Install software updates on node A
2) Power down node A
3) Run fast sync between node A (software updated) and node B (software NOT updated)
4) Install software updates on node B
5) Power down node B
6) Run fast sync between node A (software updated) and node B (software updated)

That's all...
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
peter.auchincloss
Posts: 16
Joined: Thu Jun 17, 2010 3:10 pm

Thu Jun 17, 2010 7:46 pm

extended power outage required both nodes to be shutdown - not common but does occur.
User avatar
anton (staff)
Site Admin
Posts: 4021
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Thu Jun 17, 2010 8:50 pm

You're breaking the whole idea of having HA with such a configuration...

If you have both nodes down the only way to sync is pick up the node you call the most recent one (your job) and run full sync.
peter.auchincloss wrote:extended power outage required both nodes to be shutdown - not common but does occur.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
peter.auchincloss
Posts: 16
Joined: Thu Jun 17, 2010 3:10 pm

Fri Jun 18, 2010 3:06 pm

the objective for HA to be always ON is noted -
the point was if you do need to do "controlled" shutdown there should be some method that won't require full sync on restart -

disconnect all connections from HA (shutdown all resources with iscsi connections)

both flagged in sync (no changes pending )
shutdown secondary
shutdown primary

..

as a user without power backup beyond ups this would get usage

Peter
Andrew (staff)
Staff
Posts: 5
Joined: Tue Apr 13, 2010 8:32 am

Tue Jun 22, 2010 7:52 am

In situation when both nodes was shutdown you can recreate HA device using existing image files. And on "Initialization method" wizard page you select "Do not synchronize virtual disks". In this case both nodes of HA device will be created in synchronized state without synchronization.

P.S.
Possible in nearest feature we will add for HA device some advanced options, including option for manually setting synchronization status for each node.
codex
Posts: 7
Joined: Sun Jun 06, 2010 9:02 pm

Tue Jun 22, 2010 11:34 am

Hi ,

Just been reading this thread and I was just wondering what the actual process to restore the sync is say if both nodes go down due to a power outage ?

Can you not just issue a sync when both nodes are back up ?
User avatar
Max (staff)
Staff
Posts: 533
Joined: Tue Apr 20, 2010 9:03 am

Tue Jun 22, 2010 12:11 pm

codex wrote:Hi ,

Just been reading this thread and I was just wondering what the actual process to restore the sync is say if both nodes go down due to a power outage ?

Can you not just issue a sync when both nodes are back up ?
The HA storage which goes down due to power outage loses the whole sence. And if you have 10 Tb storage it will take a while to resynchronize. So if you are sure that your device is in sync it is much faster to recreate it than to make a full sync.
Max Kolomyeytsev
StarWind Software
Thona
Posts: 28
Joined: Fri Feb 08, 2008 9:58 am

Thu Jul 01, 2010 6:42 pm

Except you may sometimes want to take things down. HA does not necessarily mean you can not take the whole thing down.

Want an example? Here it comes - my own system. Doing some stuff Sunday evening to Saturday evening. I need availability here - as much as I can. So far I run without StarWind HA - because I have all relevant systems double anyway. All items redundant. Any failure would be a potential desaster - though I do not have redundant locations so far. We talk of a system doing fully automated financial trading.

BUT: Saturday evening my time to Sunday Evening is maintenance window. Point. I can easily do system maintenance here, shut everything down. Actually I often do - upgrades, patching etc.

So, as you can see - the requirement for HA does not rule out that "controlled restart" still makes sense.
User avatar
anton (staff)
Site Admin
Posts: 4021
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Thu Jul 01, 2010 6:53 pm

That's no problem. Put them down. 48 hours should be enough to sync the arrays. And YES, we'll look into "fast sync with both nodes down" in the upcoming release.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
User avatar
anton (staff)
Site Admin
Posts: 4021
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Sun Mar 06, 2011 12:36 pm

And again... We have a misunderstanding here. When cluster is up syncrhonization takes it time IN THE BACKGROUND. So primary node serves requests and other nodes are coming back from sleep at the same time. When data is synchronized we'll update cluster status to healthy (full HA) and turn Write-Back Cache ON again.

It's like when you start your car engine it works at higher revs and lower power response for some time before it gets warmed to normal working temperature.

Assumptions guys like this one do here:

http://ccolonbackslash.com/2011/03/06/v ... -problems/

is *INCORRECT*. Wanted you to know this.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
jimbul
Posts: 23
Joined: Tue Mar 01, 2011 10:35 am

Sun Mar 06, 2011 1:14 pm

I am running 5.5, if I have it configured incorrectly and it does gracefully recover from a full shutdown i hold my hands up. I have repeatedly tried this however and short of a full sync or recreating nodes I could not see a way to do this. I realize 5.6 is available but could not see that facility there when I looked.

In short, without recreating the ha targets after a graceful power off I couldn't see a way to get the storage online again, even if it were in hobbled non-ha mode while it synced or verified data if you were happy one or both sides of the node were valid. Which is the functionality I need for occasional planned power outages, ideally managed by star wind rather than my assumptions..... Also in the event of a power blackout it would be good for the product to know which side was last written to, but I think I saw this was already being taken care of.

Thanks,

Jim.
camealy
Posts: 77
Joined: Fri Sep 10, 2010 5:54 am

Sun Mar 06, 2011 11:49 pm

I do understand the poster's original frustration. It has nothing at all (for me at least) to do with expecting you all to be perfect. I have stuck through it thick and thin. Through the past couple years of BSOD's and upgrade failures, lost data, and persistent reservations that didn't work, etc... I sold it to 3 of my customers, one that even left us because of the failures. I have tested, and tested, purchased tons of hardware, and tested more. I would even like to be a beta tester.

What really matters (being a VAR I know) is service. And back when Bob was here in the states he tried really hard. Not only do I not expect you to be perfect, but I don't expect your support staff to know everything either. However, being knowledgeable in both the sub-systems (RAID, Switching, TCP stack, iSCSI, etc) and the big reasons your software is gaining traction (Hyper-V, VMware, Citrix, etc) should be a necessity. I can't believe that for an IT infrastructure product that theoretically could take a datacenter down, there is no 24/7 support, there is no phone number that you can call and sit on the phone with someone well versed in your product and these technologies. In fact a week can go by, and I get someone who sends me emails in the middle of the night with lists of things to try and get done during the day, only to get more emails the next night, arguing with me. Oh, by the way, during which our systems are down.

I am going to continue to stick with StarWind, but for your product to really be amazing, you must remember that customer service is KING.

anton (staff) wrote:And again... We have a misunderstanding here. When cluster is up syncrhonization takes it time IN THE BACKGROUND. So primary node serves requests and other nodes are coming back from sleep at the same time. When data is synchronized we'll update cluster status to healthy (full HA) and turn Write-Back Cache ON again.

It's like when you start your car engine it works at higher revs and lower power response for some time before it gets warmed to normal working temperature.

Assumptions guys like this one do here:

http://ccolonbackslash.com/2011/03/06/v ... -problems/

is *INCORRECT*. Wanted you to know this.
jimbul
Posts: 23
Joined: Tue Mar 01, 2011 10:35 am

Mon Mar 07, 2011 6:31 am

camealy wrote:I do understand the poster's original frustration. It has nothing at all (for me at least) to do with expecting you all to be perfect. I have stuck through it thick and thin. Through the past couple years of BSOD's and upgrade failures, lost data, and persistent reservations that didn't work, etc... I sold it to 3 of my customers, one that even left us because of the failures. I have tested, and tested, purchased tons of hardware, and tested more. I would even like to be a beta tester.

What really matters (being a VAR I know) is service. And back when Bob was here in the states he tried really hard. Not only do I not expect you to be perfect, but I don't expect your support staff to know everything either. However, being knowledgeable in both the sub-systems (RAID, Switching, TCP stack, iSCSI, etc) and the big reasons your software is gaining traction (Hyper-V, VMware, Citrix, etc) should be a necessity. I can't believe that for an IT infrastructure product that theoretically could take a datacenter down, there is no 24/7 support, there is no phone number that you can call and sit on the phone with someone well versed in your product and these technologies. In fact a week can go by, and I get someone who sends me emails in the middle of the night with lists of things to try and get done during the day, only to get more emails the next night, arguing with me. Oh, by the way, during which our systems are down.

I am going to continue to stick with StarWind, but for your product to really be amazing, you must remember that customer service is KING.
I wrote the post referenced by Anton earlier in this forum. This is my first production install of Starwind, my frustration stemmed from me stupidly having already purchased two enterprise HA licenses only to discover that the both-nodes-down issue stops me using it how i wanted to and following this i look very silly (fully my fault of course, i wont make the mistake of failing to do a poweroff when i test a product like this again). When 5.7 addresses these issues i'll be delighted, as i've had to put our hyper-v cluster on hold until the product is able to deal with a powerdown. I am also very happy to beta test

I would also like to second the camealy's call for more comprehensive support, for this to be the foundations of any companies high availability virtual environment it has to be supported 24x7, even if it is only email support for the time being, I have enough trouble with my Checkpoint direct enterprise support sending me massive lists of disruptive testing to do during the night when i have issues. I have only dealt with Starwind support once so cannot comment on it's effectiveness, but certainly would want to be able to rely on someone responding to issues regardless of the timezone and day of week I may be having a problem. The forum seems super-effective, but you do need more assurance than that when you might have your entire infrastructure sat on a given product.
User avatar
Max (staff)
Staff
Posts: 533
Joined: Tue Apr 20, 2010 9:03 am

Mon Mar 07, 2011 11:09 am

Gentlemen, I think we're getting off the topic here.
As for the fullsync requirement - procedure of turning everything off correctly is +3 minutes of your time -
Just disconnect the client servers, remove the HA devices and you're ready. I know that it should be like - I turned everything off - then when I turn everything on StarWind shows that he's ok and he knows that the nodes went down simultaneously. Wait a little and you'll see it in 5.7
As for support complaints - I will inform our management and will do my best to make our customer service better.
BTW: I'm very surprised that you haven't got any phone number to call in - I will ask your solution engineer to provide you with his number.
Max Kolomyeytsev
StarWind Software
Post Reply