Cluster size / CPU Utlization

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Post Reply
User avatar
Ironwolf
Posts: 59
Joined: Fri Jul 13, 2012 4:20 pm

Mon Jul 23, 2012 6:05 pm

Hello, I am quite new to iSCSI in general; I have been working with the StarWind iSCSI software for a few weeks now. I have some basic questions that were not immediately discoverable. This will primarily be a Hyper-V VHD host, and archival data storage.

Windows 2008R2
Dual Opteron 285 (4 Cores)
16GB DDR ECC Memory
LSI 9265, 14-Drive RAID 6 (7200 RPM)
4 Port Intel 1GB Ethernet, Jumbo Frames Enabled, MPIO enabled on the other end

Stripe: Partition: LUN Partition: VHD: (IO Amplification)

I have searched for a few best practices, and most of the answers are do your own testing, which I do support, to help expose potential anomalies. I keep thinking there should be some general guidelines/best practices, though. The only thing I found is avoid straddling boundaries. Make sure everything is divisible by each other.

The grey area is when dealing with the file containers. If the Raid strip size is set for 64k, and parent NTFS is 64K, Does the LUN NTFS need to be set to 64K clusters as well? And then the VHD should be 64K?

I see mention of tuning a Snapshot/CDP file, to support various access sizes, but how does this relate to the underlining NTFS structure? For example, Stripe 64K, Parent 64k cluster, LUN tuned for 4K formatted for 4K NTFS, with VHD’s formatted at 4K. (IO Amplification) Will this cause the 16 IO’s on a single NTFS cluster or will StarWind Read/Write Combine and do everything from memory? How does the random access work with small clusters inside of large clusters (if the access is within the same cluster, is it another IO or does it do it in memory)?

Memory: CPU Usage:

I suspect the Opteron’s might not be up to the task. I started my testing with a DeDup Disk (1TB, 2GB Cache). I did a simple windows explorer copy. CPU utilization was 35-40% for 1 LUN, have not tested multiple DeDup LUNs yet but this seemed high for a single LUN.

I did and IOMeter test, to try and test Max throughput of the MPIO Ethernet, I used a 2GB memory disk. I had to use all workers (8 Threads) each set at 1MB blocks, in order to saturate the Ethernet bus. In which case IOMeter reported 250 IOPS at 420MB’s, the iSCSI Target was at 85-90% CPU utilization. If I chose smaller sizes, or less threads I could not saturate the Ethernet bus but would get 6-7K IOPS, is this normal or should it be higher for a memory disk? Is it normal to have to use large blocks to push the Ethernet and should my utilization be that high for a memory disk test?

While I fully don’t expect to utilized the system even daily at these benchmarking levels, I would rather the realization happen now instead of later that the hardware is not strong enough for future usage.

Thank you for your time
User avatar
Anatoly (staff)
Staff
Posts: 1675
Joined: Tue Mar 01, 2011 8:28 am
Contact:

Thu Jul 26, 2012 4:04 pm

Hi! Welcome on board! 8)
Actually we haven't got the Best Practices because our product is aimed to show seamless results as-is, without any additional tuning.
So all we have for now is configuration recommendations if something is working not as good as expected. Here are links to some of them:
http://www.starwindsoftware.com/forums/ ... tml#p14539
http://www.starwindsoftware.com/forums/ ... tml#p12194
http://www.starwindsoftware.com/forums/ ... tml#p12187

We strongly recommend to use identical block/stripe size and every block will be aligned - in this way you will guarantee good performance and great deduplication ratio. And of course deduplication require good CPU, and 3.5MB of RAM per 1GB of data stored on the device.


I hope it was helpful, otherwise feel free to post more
Best regards,
Anatoly Vilchinsky
Global Engineering and Support Manager
www.starwind.com
av@starwind.com
Post Reply