Over the past weekend we set up a pair of Win2012R2 servers in a cluster to provide file sharing, using Starwind v8 to do the shared storage. We're having a couple of issues, and I'm wondering a) if anyone can provide any insight and b) if they may be related to each other?
Issue #1 is that users are sporadically getting "The file or directory is corrupted and unreadable" messages when they try to save files. We have maybe 30 users on at any given time and only had about 5 reports of this happening yesterday so it's not a widespread occurrence. In each case, if the user waits a few minutes and tries to save the file again, they can do so successfully. We have not been able to tie the issue to any particular users, workstations, shares, or files... it seems to pop up at random.
Issue #2 is Starwind-related errors in the Windows logs. In the system log on both servers, every hour or few hours we get service control manager event 7031 with the message "The VssHWProviderStarWind service terminated unexpectedly. It has done this 1 time(s). The following corrective action will be taken in 60000 milliseconds: Restart the service."
At the exact same time, there will be an event 1000 "application error" in the application log with a lengthy message that includes "Faulting module path: C:\VssHWProviderStarWind\hardwareprovider.dll".
There does not *seem* to be a time correlation between the two issues. For example we had a "file or directory is corrupted" report from a user yesterday at around 8:30 AM, and in the server log the two events I mention above were recorded at 8:56 AM. However I experienced the "corrupted" problem at about 1:30 PM, and there were no log events within +/- a half-hour of that time - there was one at 12:56 AM and another at 2:49 PM.
Does anyone have any thoughts on these issues?
Edit: some additional info I should have included... the event viewer events are happening on both servers, but they do not occur at the same time. Interestingly on our 2nd server (the currently inactive one in the cluster), no events have happened since about 1:55 yesterday afternoon. That is not the case on the active server - its last event occured at 10:56 this morning.
And... in failover cluster manager, nothing appears under "cluster events". The Starwind server log on both machines contains nothing that looks wrong (just a couple of "Accepted control connection from..." entries), and in the Starwind UI under "events" nothing appears on either server.
The Latest Gartner® Magic Quadrant™Hyperconverged Infrastructure Software