how does starwind maintain data integrity?

chell · Wed Feb 14, 2018 10:27 pm

Can you please clarify how starwind maintains data integrity. I have seen a number of posts mentioning starwinds data integrity checks but its not clear how it works.
My concern is with bit rot and the fact that storage arrays are getting so large now that its likely that a non Recoverable Error will be present.
From what I have read one of the reasons REFS was invented was to mitigate this threat. Does starwind maintain data integrity on its own or should I be combining it with another tool like REFS to mitigate the risk. If so should the REFS be on the underlying storage, in the starwind drive or both?

Petas3 · Fri Feb 16, 2018 8:20 am

Hi,
I ve had similar very question: https://forums.starwindsoftware.com/vie ... f=5&t=4938
It would seem StarWind does not handle this at all - as it is mostly non-existent issue (10-14 is too conservative of an estimate as numerous sources say). Prevention of this issue should be done by either OS or RAID controller ("scrubbing").

Basically I have considered 2 scenarios:

Scenario 1 - Multi-node with on-node redundancy RAID (SW/HW)
In this case RAID should perform regular scrubbing to eliminate errors - should last basically forever (very small chance of RAID corruption, you can resync from the other node if that happens) unless drives die. Note that Raid5 is not recommended for 2TB+ drives and Raid6 is considered minimum for parity storage - personally I would use raid 10 with cheap 7200rpm drives. You can also use ReFS with Storage Spaces - it will automatically correct errors it encounters - Integrity streams feature is recommended, but really not required.

Scenario 2 - Multi-node without on-node redundancy
In this case usage of ReFS is best - as far as my research goes. You need to use Server 2016 ReFS - as previous verstion does not support the required features for StarWind. You should configure ReFS to use Integrity streams (must be done manually, this somehow has an impact on performance and does regular scrubbing - this can correct some errors, but not all - as there is no data reduncancy for local ReFS - it cannot see StarWind mirror, only local checksums). Also as ReFS is log FS you need to maintain some free space (5-10%) on volume at all times. If ReFS detects unrecoverable error you have early warning and should resync the node to eliminate the error.

Please inform me of your findings as well

Petas3 · Fri Feb 16, 2018 8:39 am

If you have only 1 node then its more difficult. Plase share your configuration.

About the nested FS, I think it only matters on the StarWind node itself, the StarWind device FS does not matter, and in fact, nested ReFS might be bad for performance (this might not be true - depends on StarWind implementation - ive read somewhere its basically independent of the host FS, it might be using direct block access or something).

It would be really nice to know for sure (confirmed by StarWind expert) if running StarWind storage device can be checked by underlying ReFS on Server 2016??

Fri Feb 16, 2018 1:54 pm

Thank you, Petas3, for your comment

Regarding data integrity was a very similar question, here is the answer:
"No, there is no scrubbing in StarWind, this is resolving on a RAID level.
In case of disk error on read another node will serve the request to HA. The HA device on the node with error will be turned off until the moment of finishing full sync, the full synchronization process will start on this HA device.
But this situation possible only on RAID 0 and disks without RAID. On other RAID types, StarWind most likely will not even see the error, it will be caught by the RAID itself."

You can find more information here.