Did the recommended shutdown procedure and still lost everything

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Post Reply
tomb
Posts: 2
Joined: Sat Mar 11, 2017 5:39 pm

Sat Mar 11, 2017 6:51 pm

We had to shut our 2 ESX servers down last night to move them. We got the recommend Starwind shutdown procedure from tech support. We performed that as given. An hour later when we powered the servers on according to the recommended procedures, which is to bring the Priority: First server online first, bu only 2 of 3 of our vmware datastores came back online. We waited for everything to sync successfully in case that would fix the issue, but it still didn't bring the 3rd one back online even though it said synchronized about an hour later. Multi-pathing shows both paths in ESX. I can see the Starwind device that has the data for this 3rd target under devices in storage adapters but it doesn't show a VMFS partition. It's like it's a brand new drive to ESX.

Looking more into the Starwind logs and interface I notice something very disturbing. On Starwind server #1 in Statistics it says User Data Size 780.05GB and on Starwind server #2 it says User Data Size 444.83GB. It also says Synchronization Status as Synchronized. How can they be synchronized if almost half of the data is missing? The other 2 have identical values.

How can such a controlled and guided reboot process go so horribly wrong? I have over 1TB free on both servers so I didn't run out of space. The directory with the LSFS files is also different sizes.

I was up until 2:30AM on Friday trying to fix this. It's Saturday and I opened a Starwind case and nobody is around. I'm now trying to fix this on my own and I fear there is nothing I can do to get my data back.
User avatar
anton (staff)
Site Admin
Posts: 4021
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Sat Mar 11, 2017 10:59 pm

Sorry to hear about your issues! Few statements:

1) Logs us are for us and not for customers. So don't try to analyze them and perform any actions based on your guesses taken from them.

2) Backups are a must. We don't allow any system w/out backups in production and while we can't control and enforce that 3-2-1 backup rule is still there.

https://knowledgebase.starwindsoftware. ... ckup-rule/

3) If you have a priority support what's the number you've called? If not or you're a free version user please PM me your details and I'll escalate the ticket to emergency level from paying customer manually.

4) Don't try to fix anything yourself! Overwriting the data with a wrong one might happen. Don't do that as process is non-recoverable unless you run snapshots.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
tomb
Posts: 2
Joined: Sat Mar 11, 2017 5:39 pm

Sun Mar 12, 2017 12:57 am

I have a paid version and paid support. I don't know what level as it doesn't say on my invoice. I don't remember being given the option of choosing a service level.

Yes, we have backups. We'd never do a physical move without backups immediately prior to the move but our good sense doesn't excuse losing everything on the datastore. Unfortunately, it takes 24 hours to restore what was on this LUN. That's why we contacted tech support before even touching anything to make sure we knew exactly how to do the shutdown and startup.

When I submitted my ticket 12 hours ago the helpdesk system responded with, "If you require an immediate support, we recommend upgrading your support plan to premium support package. To upgrade, please contact sales at sales@starwindsoftware.com" and since I didn't get a response I assume I'm not a premium support customer. E-mail sales for them to respond to me in a few days doesn't help me right now.

The 2012 servers that Starwind was running on was triggered by Windows to do a shut down about 3 weeks ago and it corrupted several of our VMs. This happened in the middle of the day and cost us 24 hours of outage. After the days of outages we've experienced recently and the lack of options for support escalation from Starwind today we decided to just cut our losses and eliminate Starwind from our environment. I would have happily whipped out the corporate credit card to get help today but I didn't see that as an option in the e-mail response.

The product does work nice when it's running but it's Russian roulette any time you have to reboot one of the nodes. On the last support call I upgraded to the latest release per the support person to help with the hour long synchronizations that happened even if we did a clean shutdown. Synchronization still takes almost an hour per 1TB LSFS target even on a hardwired 10Gb network on PCIe NVMe storage on a HP ML350 gen9 server.

I did run the log utility and take some screen shots before I started going through extreme measures to try to save my data and get my business running before work starts Monday morning. If you want them so you can analyze them to try to keep this from happening to someone else just have whoever gets my case in the next few days send me ftp info.

Also, I suggest adding a pay support option in the e-mail whereby you can click on a web form and fill out your credit card info for 24/7 support. You can determine the price the works for SW but it should at least be an option. Based on the e-mail response the earliest I would be contacted was afternoon on Monday and the latest would be late Tuesday. It isn't possible for me to leave our company down for that long if I had an option to work hard all weekend to get the systems back online.
User avatar
anton (staff)
Site Admin
Posts: 4021
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Sun Mar 12, 2017 5:58 am

You should have direct phone number with your support contact (we had to remove it from public section of the site because people abuse it's usage calling for free version support, asking sales related questions etc). Business hours for standard support don't cover weekends obviously - that's what ProSupport or Premium support does. I'll double check everything and also make sure there's a way to escalate emergency level for somebody who has Standard support w/out being charged immediately of course. Thank you for suggestion!

TL;DR: Well that's weird to say at least. Let me bring in engineers as soon as I'll jump of the plane. Sorry once again for your broken maintenance!
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
User avatar
Oles (staff)
Staff
Posts: 91
Joined: Fri Mar 20, 2015 10:58 am

Mon Mar 20, 2017 10:04 am

Tom,

Please accept my sincere apologies for the maintenance issue that you have faced with our software.
Right now we are reworking our Support plans and will make sure to provide our customers with an option of immediate support.
Also, we did provide you with the FTP credentials so, when you have a minute, please do not hesitate uploading the troubleshooting information there.

Just in case you decide using our software again, please ping me and I will make sure to provide you with the best support contract free of charge.

Thank you.
Post Reply