Hello, I am quite new to iSCSI in general; I have been working with the StarWind iSCSI software for a few weeks now. I have some basic questions that were not immediately discoverable. This will primarily be a Hyper-V VHD host, and archival data storage.
Windows 2008R2
Dual Opteron 285 (4 Cores)
16GB DDR ECC Memory
LSI 9265, 14-Drive RAID 6 (7200 RPM)
4 Port Intel 1GB Ethernet, Jumbo Frames Enabled, MPIO enabled on the other end
Stripe: Partition: LUN Partition: VHD: (IO Amplification)
I have searched for a few best practices, and most of the answers are do your own testing, which I do support, to help expose potential anomalies. I keep thinking there should be some general guidelines/best practices, though. The only thing I found is avoid straddling boundaries. Make sure everything is divisible by each other.
The grey area is when dealing with the file containers. If the Raid strip size is set for 64k, and parent NTFS is 64K, Does the LUN NTFS need to be set to 64K clusters as well? And then the VHD should be 64K?
I see mention of tuning a Snapshot/CDP file, to support various access sizes, but how does this relate to the underlining NTFS structure? For example, Stripe 64K, Parent 64k cluster, LUN tuned for 4K formatted for 4K NTFS, with VHD’s formatted at 4K. (IO Amplification) Will this cause the 16 IO’s on a single NTFS cluster or will StarWind Read/Write Combine and do everything from memory? How does the random access work with small clusters inside of large clusters (if the access is within the same cluster, is it another IO or does it do it in memory)?
Memory: CPU Usage:
I suspect the Opteron’s might not be up to the task. I started my testing with a DeDup Disk (1TB, 2GB Cache). I did a simple windows explorer copy. CPU utilization was 35-40% for 1 LUN, have not tested multiple DeDup LUNs yet but this seemed high for a single LUN.
I did and IOMeter test, to try and test Max throughput of the MPIO Ethernet, I used a 2GB memory disk. I had to use all workers (8 Threads) each set at 1MB blocks, in order to saturate the Ethernet bus. In which case IOMeter reported 250 IOPS at 420MB’s, the iSCSI Target was at 85-90% CPU utilization. If I chose smaller sizes, or less threads I could not saturate the Ethernet bus but would get 6-7K IOPS, is this normal or should it be higher for a memory disk? Is it normal to have to use large blocks to push the Ethernet and should my utilization be that high for a memory disk test?
While I fully don’t expect to utilized the system even daily at these benchmarking levels, I would rather the realization happen now instead of later that the hardware is not strong enough for future usage.
Thank you for your time
The Latest Gartner® Magic Quadrant™Hyperconverged Infrastructure Software