Performance issues on 1 node (Hyper-V 2-Node HCI using CVM)

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Post Reply
andrew.kns
Posts: 8
Joined: Wed Sep 11, 2024 8:04 pm

Mon Jan 13, 2025 5:02 pm

I am experiencing an issue where the disk latency is > 10,000ms for any guest VM hosted on node 1. Guest VM's hosted on node 2 have an average disk latency <500ms and I don't know why node 1 performs significantly worse than node 2. Moving / migrating a VM from node 1 to node 2 reflects an improvement in storage performance (and usability as a side effect). As a result almost all VM's are currently being hosted on node 2 rather than node 1 in a 50/50 split.

Host 1 & 2 are configured identically and are directly connected to each other (no switches) over 25G fiber SFPs. The storage is parsed directly though to CVM. Moving the VHD of the VM from one CSV to another does not make a difference. I have re-validated all configuration steps and cannot find a difference between the two host nodes and I am beginning to think that it may be an issue with how Windows is accessing the iSCSI resources offered by the CVM's.

What are some troubleshooting steps that I could look at next to investigate the performance issue? I will also note that the Windows event log shows high I/O latency events as well for the CSV's.
yaroslav (staff)
Staff
Posts: 3598
Joined: Mon Nov 18, 2019 11:11 am

Mon Jan 13, 2025 5:50 pm

What is the storage configuration?
Look at the underlying storage processes and MDADM configuration.
See if there are any processes in mdstat. Also, make sure the disks are healthy. Please also consider rebooting the CVMs one by one if there is nothing running there.
andrew.kns
Posts: 8
Joined: Wed Sep 11, 2024 8:04 pm

Mon Jan 13, 2025 6:48 pm

yaroslav (staff) wrote:
Mon Jan 13, 2025 5:50 pm
What is the storage configuration?
Look at the underlying storage processes and MDADM configuration.
See if there are any processes in mdstat. Also, make sure the disks are healthy. Please also consider rebooting the CVMs one by one if there is nothing running there.
The storage Configuration is as follows:
- PERC H755 controller parsed using PCIe passthrough to the CVM
- 6x 1TB SSD (Hardware RAID 10)
- 6x 10TB HDD (Hardware RAID 10)

The storage processes inside the CVM's appear to be ok.. The disks are healthy and running the latest firmware pushed by Dell. The CVM's have been updated to the latest version and restarted within the last week. This issue has been lingering for quite some time now (more than 3 months). I replaced the NIC in both hosts, upgrading from 10G to 25G to see if that would improve the issue. This however unfortunately did not resolve the latency problems.
yaroslav (staff)
Staff
Posts: 3598
Joined: Mon Nov 18, 2019 11:11 am

Tue Jan 14, 2025 12:03 am

Could you please run the tests for the underlying storage with FIO (4k and 64k sequential and random patterns)?
Also, could you please share the network diagram? Please include the network aggregations if there are any.
alicebelinda
Posts: 1
Joined: Fri Apr 04, 2025 11:04 am
Contact:

Fri Apr 04, 2025 11:08 am

It's smart that you've already ruled out some of the usual suspects like VM placement and CSV location. Identical hardware and direct 25G fiber? That should be a recipe for snappy performance across the board.
Post Reply