I'm trying to max out performance in a simple setup, but it seems that I'm hitting some sort of a bottleneck and I cannot seem to find it.
I have a simple setup consisting of two servers - a VMware ESXi v6.5 and a Starwind server. They are connected back-to-back using a single 10 GbE link. Specs are defined in the bottom of this post.
The iSCSI store is placed on a RAID5 (64KB stripe size) with 8 x Intel DC S3500 SSD's. HW RAID Controller cache and Starwind cache is disabled. Enabling Starwind cache apparently worsens performance for some reason.
Jumbo frames are enabled.
When I run ATTO from within a VMware guest it is able to saturate the 10 GbE iSCSI link when writing (VMware to Starwind). But read is slower and not able to saturate the link (Starwind to VMware).
Only one VMware guest is located on the RAID array to ensure that no outside load is placed on the array during testing.
I believe the bottleneck is on the Starwind server as I've been able to dramatically increase both write and read performance by adjusting the NIC settings profile from "Standard Server" to "Low Latency" on the Starwind server.
Setup for the Starwind server is as follows:
Software:
OS: Server 2016
SW: StarWind Virtual SAN v8.0.0 (Build 10833, [SwSAN], Win64) (free edition)
Hardware:
1 x Intel Xeon X5670
24GB RAM
Intel X520 10GbE NIC (1 link dedicated for Starwind)
LSI 9265-8i RAID controller
VMware ESXi:
The hardware specs for the VMware ESXi server is identical to to the Starwind server.
ATTO benchmark directly on the RAID array on the Starwind Server: ATTO benchmark from within the VMware guest when using the Low Latency profile on the NIC (best performance I'm able to achieve) - 9K frames: ATTO benchmark from within the VMware guest when using the Standard Server profile on the NIC - 9K frames:
Note that both read and write are significantly lower here I'm not complaining about performance - I would just like to be able to saturate the 10 GbE link in both directions as I think it should be possible.

Are there any tweaks that can be done within Starwind to increase performance in an All-Flash setup or any other ideas as to where I should look in order to get the last bit of juice out of this setup?