um, Max? broke us and left us hanging

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Post Reply
Tom
Posts: 2
Joined: Tue Mar 19, 2013 7:54 pm

Tue Mar 19, 2013 8:11 pm

Hi there.
We had one of our HA pair crash last night. 3 of our 8 datastores had successfully synced upon restart, I had resynced one manually.
4 remaining datastores needed to sync when we had Max start to work with us.
After discussing the options, Max suggested we take down High Availability on the remaining datastores and remove the stores, then rebuild them as basics and connect them back to the vsphere... saving us hours of resync time so that performance would not be an issue and we could bring our servers all up within the hour. We figured this was fantastic, we would have our machines up, customers angry but manageable and during maitainance windows through the week we could return to HA on the ones no longer there.
Max did the work. He brought down the datastores, recreated them pointing them properly to the files and then he reconnected them to one host.
I came back from getting coffee and Max had ended the phone conversation with our other engineer.
At this point, we started attempting to connect to the "new" datastores
we have 4 hosts, 30+ virtuals running
2 of the remaining hosts were able to connect to two of the rebuilt datastores, but are unable to connect to the other two, only allowing the option to "format". The third host can only reconnect to one of the rebuilt datastores.
It has been hours ( we got off the phone with him around noon CST). We have spoken with Jack a couple times, I hate harrassing the guy, we have support that should be available during business hours and have set up to pay for extended support..
Right now, we are in a much worse position than we were hours ago when we got off the phone with Max. Prior to reaching Max, we were 10-13 hours from being back online.
With Max's "fix" we are now DOWN with no available estimate for our clients, and looking at a need to restore from backup entire systems. As well as rebuild our SAN environment. This becomes DAYS.

honestly at this point it becomes a WTF? situation. Someone please call us back.

Tom Provost Systems Engineer, Allison Royce & Associates, Inc 210 564 7000
Tom
Posts: 2
Joined: Tue Mar 19, 2013 7:54 pm

Wed Mar 20, 2013 12:42 am

Joe, thank you for all the assistance this afternoon.
After getting off the phone with you, we tore apart the vsphere, removed the datastores from each host that were not loaded on all of them.
We then added all of the hosts back to vsphere, then added the datastores to one host.
After this, they were unable to be added to the other hosts.
At this point after rebuilding the vsphere, we were able to force mount hte datastores on the other hosts through command line.
all servers but one exchange server are up and running now.

Enjoy the snow.
User avatar
Max (staff)
Staff
Posts: 533
Joined: Tue Apr 20, 2010 9:03 am

Wed Mar 20, 2013 10:52 pm

Hello Tom,
Thank's for updating!
Yesterday I have left you and Matt only after you have confirmed that the datastores are reconnected to the ESX.
At that point I have asked if there is anything else I can do and either you or Matt (I was on speaker so I can't tell exactly) said "yes".
Usually I take this as a confirmation of my job being finished.

I'm sorry that I have not addressed a confirmed vSphere bug with VMFS5 force mount.
http://kb.vmware.com/selfservice/micros ... Id=1011387
I swear that if I'd see this bug during the Remote session I would definitely recognize it and assist you with a workaround,
not taking into account that it's something that a qualified VMware support engineer should do.
But unfortunately the issue didn't appear during our remote Session since the it ended after reconnecting datastores to only one ESX host.
Max Kolomyeytsev
StarWind Software
Post Reply