Slow recovery of big LSFS storage after downtime
Posted: Wed Sep 24, 2014 5:01 pm
Allied with all the tests about automatic resync after node down time, I've stopped and re-started the StarWind service on one node. Automatic resync has indeed kicked in but it was taking a lot longer than last time I tried this. However, this time, my LSFS storage is much bigger (275GB):
Upon checking resource monitor, StarWind appears to be reading every single SPSPX file one by one:
I assume it's carrying out some kind of integrity check? This will take a reasonable amount of time in my lab but in production with large LSFS disks (say 20TB), then this process I assume could take a pretty long time - hours certainly. During this time, HA won't be working and one assumes read performance on that node is impacted for any other storage that isn't in this device status.
I think this a) needs documenting so people don't panic and b) the status message says "Device status: Creating" which could possibly be phrased better like "Checking integrity"
Cheers, Rob.
Upon checking resource monitor, StarWind appears to be reading every single SPSPX file one by one:
I assume it's carrying out some kind of integrity check? This will take a reasonable amount of time in my lab but in production with large LSFS disks (say 20TB), then this process I assume could take a pretty long time - hours certainly. During this time, HA won't be working and one assumes read performance on that node is impacted for any other storage that isn't in this device status.
I think this a) needs documenting so people don't panic and b) the status message says "Device status: Creating" which could possibly be phrased better like "Checking integrity"
Cheers, Rob.