Testing storage box with virtual san, performance issues

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Post Reply
mephisto
Posts: 6
Joined: Tue Jan 09, 2018 11:44 am

Tue Jan 09, 2018 12:00 pm

Hi guys,

I decided to give Virtual SAN a try now as last time I looked into it I didn't have the time to spend playing with it and checking if it was good enough for production use in my case.

I have a small VMware environment where I use some storage boxes and local storage and I was hoping to replace the OS on these storage boxes (Dell R720XDs) with Windows 2016 and Virtual SAN.

I'm testing RAMdrive at the moment as when I first tested a datastore on a LSFS LUN the performance was not what I was expecting on a 10GB network, so I was hoping I can pick up from your brains here to see what I can do to improve things. I didn't want to involve pre sales at this stage because I don't want the pressure of sales people calling me, I'm a geek after all and prefer to deal by email/message as it is more efficient :)

Anyway, I've done the following so far:
  • R720XD with 384GB RAM, Intel 3520 PCI-E SSD 2TB for L2, 12x seagate 6TB 7.2K enterprise class in RAID-10, Win2016, Virtual SAN 8
  • 10 GB network and jumbo frames enabled
  • LSFS datastore and RAMDISK datastore tested
  • ESXi 6 build 6921384, using software iscsi
I noticed a couple of things when testing the HDD with L1 cache, I moved a whole VM about 150GB into a LSFS datastore, I was expecting eventually it to be flushed to disk as soon it could but actually it was running in RAM for a long time, eventually I could see lots of write bursts which I think was data being flushed to disk, is this by design? I was expecting data to be flushed to disk as soon as possible, while still being kept in L1 or L2. What is the disk flush agenda? How does it work exactly?

How does LSFS affect sequential read/writes? I'm not getting what I was expecting here, my PCI-E SSD can do 2500 read 2000 write, the HDD array can do 1500 read 1200 write, on a RAM disk I can't get more than 400mb reads and writes at around 700 writes. I was expecting it to fully saturate a 10gb connection both ways.

when I create a datastore, it uses VMFS5, but I can;t get a block different than 1MB which I believe limits VMDKs to 2TB? Is there a work around about this with iscsi?
User avatar
anton (staff)
Site Admin
Posts: 4021
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Tue Jan 09, 2018 4:40 pm

LSFS should be used with parity or double party RAIDs and spinning disks, for RAID10 and NVMe as an L2 cache please stick with flat image files.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
oscaru
Posts: 21
Joined: Wed Mar 08, 2017 1:27 am

Wed Jan 10, 2018 1:40 am

Hi Anton,

Actually I read RAID 10 was recommended for HDD in your KB

https://knowledgebase.starwindsoftware. ... ssd-disks/
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Wed Jan 10, 2018 10:29 am

oscaru,

RAID10 is correct for HDDs.
mephisto
Posts: 6
Joined: Tue Jan 09, 2018 11:44 am

Thu Jan 11, 2018 6:55 pm

I've done some more testing, flat disk, LSFS and RAMDISK.

I'm not really finding the performance anything like what it should be in theory.

I can't get it why on a 10GB network you can only use about 50% of bandwidth with a RAMDISK, what could I be doing wrong here? What should be the result?

If you get a fast box with ram and 10GB NICs, should I expect to saturate the network?
mephisto
Posts: 6
Joined: Tue Jan 09, 2018 11:44 am

Mon Jan 15, 2018 11:14 am

With LSFS, using a 30TB disk in raid-10 with 384GB ram and 2TB NVME ssd L2, I vmotion one VM of 180GB, it is completely written to RAM, after a while when the move is finished virtual san starts writting to disk at very slow rate of 200-300mb/sec bursts like 300, then stops, then 300 agian and stops. Why is it not writting faster than that? Onle this VM running on the storage nore, it is no more than 5% busy
Attachments
Image.png
Image.png (2.95 KiB) Viewed 14998 times
mephisto
Posts: 6
Joined: Tue Jan 09, 2018 11:44 am

Mon Jan 15, 2018 11:15 am

I'm also trying it with flat disks, not really much if any difference.

What do you guys advise me doing?
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Mon Jan 15, 2018 5:32 pm

Without any additional information, this figure looks pretty much like cache lazy writer flushing cache to the underlying storage. The limit you see may be caused by the performance of your underlying storage or the activity of your VM. The latter is preconditioned by the fact that whenever cache lazy writer detects that the data in cache is not addressed it starts flushing data to the disk.
mephisto
Posts: 6
Joined: Tue Jan 09, 2018 11:44 am

Tue Jan 16, 2018 1:10 pm

I unmounted the datastore, left about 4-5 hours running and still was having data being read from disks on the storage node every few seconds. It was reading the thindisks (files) about 10 of them at time at around 500mb/sec, no writes.

I deleted the disk from virtual san and the reads stopped.

I ran some iometer tests as instructed on the documentations and got the results attached to the image here. The storage is more than capable of doing 1000mb/sec+ in sequential mode
Attachments
iometer node 1 worker.JPG
iometer node 1 worker.JPG (120.21 KiB) Viewed 14937 times
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Thu Jan 18, 2018 1:35 pm

mephisto,

Since you you are using the trial version of StarWind VSAN, we have been trying to reach you offline in order to troubleshoot and resolve your issue. Unfortunately, you have not responded to our communication yet.
mephisto
Posts: 6
Joined: Tue Jan 09, 2018 11:44 am

Mon Jan 29, 2018 4:49 pm

Hi guys,

I've been travelling for the last 5 weeks so I was checking things here on my spare time in the evenings :)

Anyway, I'll touch base with your support, I've done a massive amount of testing over the last few days so we can see if performance can be improved.

Thanks!
Boris (staff)
Staff
Posts: 805
Joined: Fri Jul 28, 2017 8:18 am

Mon Jan 29, 2018 5:11 pm

Perfect. We will get in touch with you then.
Post Reply