So clearly I am misunderstanding something or have misconfigured something so maybe someone can help me get this right in my head
I created a Windows VM and ran CrystalMark in it and got about 120MB/s. When I checked esxtop on the ESXi6 server the four 1Gb NICs were never using more than 1Gb combined. Not sure what happened ( a reboot I think) but now when I run CrystalMark I get about 450MB/s read/write speeds and I can set the four 1Gb NICs in esxtop spike up to about 800mb/s each.
So I *think* MPIO is working with iSCSI and setting Round Robin on my two test datastores?
So then I tried a further test and that was to copy a 3GB zip file between different folders in the Windows VM. Again, it never went about 120MB/s. Surely if CrystalMark is showing 450MB/s read/write speeds then when I copy a single large file of 3GB I should see similar speeds as CrystalMark? I'm just confused why I seem to be hitting the speed of a single 1Gb NIC?
My two LSFS devices have 4GB cache (RAM) and I have created a 20GB flash cache on an NVMe SSD drive.
The other odd thing I find is that no matter what I am doing in my test VMs (benchmarking, copying large files etc), if I monitor disk performance in task manager on the Starwinds Virtual SAN server it never seems to reflect the speeds of whats going on in the VM? ie: if I run CrystalMark and it shows a result of 450MB/s task manager never reflects this?
So correct me if I am wrong here but if I have four 1Gb NICs in my ESXi server AND I have another four 1Gb NICs in my Starwinds Virtual SAN server then surely my VM(s) should perform (roughly) at the speed of the SSD then (about 400MB/s)? If I test the storage locally on the SAN server then ALL disks get 400-500MB/s. Its only the iSCSI performance which is not consistent or workling as expected. I've insalled the Windows Server MPIO role and enabled MPIO for iSCSI devices in MPIO and configured Round Robin on both datastores. I'm using network port bonding for iSCSI and using a single subnet for iSCSI storage. All iSCSI networking is in its own VLAN on the switch. I have Starwinds Virtual SAN installed on Windows Server 2016 (patched as of today). I've also changed the IOPS value from 1000 to 1 for each datastore for Round Robin to improve performance.
Despite all this my iSCSI performance still seems sluggish/random and not performing as I'd like but I just can't pin point the issue. I'm using Samsung SM863 Enterprise SSD drives and all drives have the latest firmware and quad HP nc364t network cards in each server.
I also find that VAAI doesn't seem to be working correctly as when I clone or storage vmotion a VM between two datastores on the same server the traffic goes over the iSCSI NICs instead of staying local on the SAN. All datastores are reporting that hardware acceleration is supported.
So I am doing something wrong here? is this a problem with my configuration (or my knowledge!), a problem with Windows Server 2016 RTM or something else? This is my first real SAN/VMware setup with shared storage so I would really appreciate some help. I've read so much online and various whitepapers but I guess I have missed something!
Thanks for reading and for any help!

Am happy to provide any further info if it'll help (screenshots, config etc)