RAID 5 degraded at every reboot

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Post Reply
aptfinf
Posts: 5
Joined: Wed Aug 07, 2024 12:26 pm

Thu Aug 08, 2024 3:13 pm

Hello, we have deployed Starwind CVM in VMware and we are creating a RAID 5 storage pool with 3x 3.49TB NVMe SSD connected via PCI passthrough.
Creation and raid sync have both completed successfully.
We didn't proceed with volume and LUN creation because we are experiencing a strange problem: at every reboot the raid 5 pool is degraded, with the 3rd NVMe device removed (it's always the same device that gets removed at boot).
If we manually readd it with mdadm the raid become online and clean, but then again degraded after reboot.
We checked SSDs with smartctl but we didn't see any problem.
How can we fix this issue?

Thanks in advance
yaroslav (staff)
Staff
Posts: 3599
Joined: Mon Nov 18, 2019 11:11 am

Thu Aug 08, 2024 3:34 pm

Greetings,

Can you please share the logs with us? https://www.starwindsoftware.com/support-form Use 1196439 as a reference.
aptfinf
Posts: 5
Joined: Wed Aug 07, 2024 12:26 pm

Thu Aug 08, 2024 5:10 pm

yaroslav (staff) wrote:
Thu Aug 08, 2024 3:34 pm
Greetings,

Can you please share the logs with us? https://www.starwindsoftware.com/support-form Use 1196439 as a reference.
Logs just sent from the support form,
Thank you
aptfinf
Posts: 5
Joined: Wed Aug 07, 2024 12:26 pm

Fri Aug 09, 2024 3:29 pm

yaroslav (staff) wrote:
Thu Aug 08, 2024 3:34 pm
Greetings,

Can you please share the logs with us? https://www.starwindsoftware.com/support-form Use 1196439 as a reference.
Today we tried to change SSD with another one (same model) but the issue persists.
The issue isn't presenting if, instead of using PCI passthrough, we use RDM devices.
Since the problematic SSD is always the same, could it be that SAMSUNG MZQL23T8HCLS-00A07 have some problem with vmware PCI passthrough?
The other two SSDs are intel, and they are working perfectly.
yaroslav (staff)
Staff
Posts: 3599
Joined: Mon Nov 18, 2019 11:11 am

Fri Aug 09, 2024 3:36 pm

RDM is generally giving slightly less performance than pass-through.
The SSDs you shared are consumer-grade ones. They are not quite well for virtualization workloads, from my experience. Make sure discard is off https://current.workingdirectory.net/po ... d-discard/
How to turn off discard for each node https://forum.openmediavault.org/index. ... -lvm-ext4/ (or make sure it is off).
aptfinf
Posts: 5
Joined: Wed Aug 07, 2024 12:26 pm

Fri Aug 09, 2024 6:17 pm

yaroslav (staff) wrote:
Fri Aug 09, 2024 3:36 pm
RDM is generally giving slightly less performance than pass-through.
The SSDs you shared are consumer-grade ones. They are not quite well for virtualization workloads, from my experience. Make sure discard is off https://current.workingdirectory.net/po ... d-discard/
How to turn off discard for each node https://forum.openmediavault.org/index. ... -lvm-ext4/ (or make sure it is off).
These are SSD that our cloud provider sets up so we are stuck with this model.
I checked the post you have linked but i can't really understand what should i do:
Do i have to enable this

Code: Select all

/etc/lvm/lvm.conf
issue_discards = 1
or this

Code: Select all

/etc/modprobe.d/raid456.conf
options raid456 devices_handle_discard_safely=Y 
Or both?
yaroslav (staff)
Staff
Posts: 3599
Joined: Mon Nov 18, 2019 11:11 am

Fri Aug 09, 2024 7:31 pm

Hi,

Both. They are to be set to 0 and N respectively.
yaroslav (staff)
Staff
Posts: 3599
Joined: Mon Nov 18, 2019 11:11 am

Mon Aug 12, 2024 10:40 am

Hi,

Did the tweak help?
aptfinf
Posts: 5
Joined: Wed Aug 07, 2024 12:26 pm

Mon Aug 12, 2024 1:39 pm

yaroslav (staff) wrote:
Mon Aug 12, 2024 10:40 am
Hi,

Did the tweak help?
Hi,
i didn't have the chance to test this tweak since we managed to change the SAMSUNG MZQL23T8HCLS-00A07 with Intel SSDPF2KX038T1 and this has resolved the issue.
Anyway, i will save this tweak for future reference, you'll never know.

Thank you for you assistance!
yaroslav (staff)
Staff
Posts: 3599
Joined: Mon Nov 18, 2019 11:11 am

Mon Aug 12, 2024 3:27 pm

Hi,

Great news.
You are always welcome :)
Post Reply