Dedupe question

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

xy67
Posts: 27
Joined: Tue Jan 05, 2016 9:21 pm

Mon Aug 08, 2016 12:47 pm

Hi All

I have a question relating to dedupe and using Starwind as an iSCSI target.

The setup:
An iSCSI target with 40GB on my Windows Server 2012 R2 server (using Starwind) with 3 test ESXi 6 hosts connected. When I create the disk I select LSFS and dedupe.

So then I create a test VM and it takes about 19GB of space. On the Windows/Starwind server it shows that the iSCSI target disk is using about 19GB of space. This makes sense to me.

Then I clone the above VM to the same iSCSI target disk. So now I have two identical VMs. Starwind console shows the dedupe ratio of over 2 and the disk space used on the Windows/Starwind server is still about 19GB. This also makese sense to me.

But on the VMware side of things, when I click the 40GB datastore it is showing that the datastore is almost full (less than 2GB free space). I was just wondering if theres a better way to monitor free disk space from the vCenter side of things when dedupe is used on the disks created in Starwind?

In theory my above example iSCSI disk should have about 20GB of free space but vCenter thinks my datastore is (almost) full.

Am I missing something here? What is the best way to monitor/manage free disk space when dedupe is involved from vCenter? Or is this not possible and I have to always check on the Starwind server?

Thank you!
Michael (staff)
Staff
Posts: 319
Joined: Thu Jul 21, 2016 10:16 am

Fri Aug 12, 2016 11:26 am

Hello xy67,
Deduplication is going on below file system level. That is why vSphere does not know about deduplication process and "thinks" that it has 40 GB. As a result, you have a benefit on space on StarWind server, where you can create other devices or use it according to your needs.

Please keep in mind that LSFS device might use more space than initial LSFS size, thus I would recommend you have some spare space on StarWind server.

See find the link to LSFS technical description article for details:
https://knowledgebase.starwindsoftware. ... scription/
xy67
Posts: 27
Joined: Tue Jan 05, 2016 9:21 pm

Sun Sep 11, 2016 11:50 am

Thanks for the reply Michael.

So if I have a single 512GB SSD drive that I want to use only for iSCSI storage using Starwind and LSFS AND dedupe, how big shall I create the drive when setting it up in Starwind?

Thank you!
Michael (staff)
Staff
Posts: 319
Joined: Thu Jul 21, 2016 10:16 am

Thu Sep 15, 2016 5:27 pm

Hello xy67,
According to LSFS technical description, the device size should be about 170-200 GB
xy67
Posts: 27
Joined: Tue Jan 05, 2016 9:21 pm

Mon Sep 19, 2016 6:21 pm

Michael (staff) wrote:Hello xy67,
According to LSFS technical description, the device size should be about 170-200 GB
Thanks Michael.

Does Starwind Virtual SAN support Windows Server 2016 yet?
Michael (staff)
Staff
Posts: 319
Joined: Thu Jul 21, 2016 10:16 am

Mon Sep 19, 2016 11:15 pm

Hello xy67,

It was previously tested in Windows Server 2016 TP5 without any issue. However, obviously, it was not tested in released version yet :)
xy67
Posts: 27
Joined: Tue Jan 05, 2016 9:21 pm

Sun Sep 25, 2016 11:57 am

Michael (staff) wrote:Hello xy67,

It was previously tested in Windows Server 2016 TP5 without any issue. However, obviously, it was not tested in released version yet :)
Thanks!

Some more questions:

1) Is this the latest version of Starwinds Virtual SAN: 8.0.9781.0

2) Getting back to my question re iSCSI, dedupe and LSFS. I have a 480GB SSD. Formatted it has 447GB of usable space. Does that mean I should create an LSFS device with the size of 150GB? I know dedupe will help me maximise my disk space but it just seems like I am losing so much disk space with LSFS? (150GB vs 450GB).

3) Does Virtual SAN support Round Robin with vSphere 6 when using MPIO (multipathing) with iSCSI?
Michael (staff)
Staff
Posts: 319
Joined: Thu Jul 21, 2016 10:16 am

Tue Sep 27, 2016 4:48 pm

Hello xy67,

1) Current StarWind Virtual SAN build is 8.0.9996.0;
2) Yes, according to LSFS technical description article, it is recommended to keep free space on LSFS underlying storage. Yes, with .IMG devices, you can use 450 GB without any issues.
3) Yes, Virtual SAN support Round Robin with vSphere 6 when using MPIO (multipathing) with iSCSI.
xy67
Posts: 27
Joined: Tue Jan 05, 2016 9:21 pm

Sun Oct 30, 2016 9:55 am

I have another question, when creating an LSFS device it gives me the option to "use flash cache". What does this do and should I enable it for an LSFS device that will be hosting VMs in VMware? I do have an NVMe SSD so should I use this for flash cache? How big should it be if the flash cache option is used?
xy67
Posts: 27
Joined: Tue Jan 05, 2016 9:21 pm

Sun Oct 30, 2016 3:17 pm

Now that I am starting to use Virtual SAN in my lab I wanted to start using dedupe and LSFS but I seem to have confused myself :shock:

I have a single SSD, its 450GB formatted. When I create a a new device to my target and select LSFS with dedupe what size do I enter here? If I enter 150GB (as LSFS device can grow up to 200% of disk space_ then when I present this LUN to ESXi and create a datastore it is only 150GB. So how do I tell ESXi that that LUN is actually 450GB (or bigger) so I can overprovision it? I'm sure I did this in a test setup but can't remember what I did. When creating the datastore in ESXi it doesn't let me specify a size bigger than 150GB as it says the disk/LUN usn't big enough.

What am I doing wrong here?? Or do I create an LSFS device thats (say) 1TB on my 450GB SSD and then only create a datastore that is smaller in size? Which way round do I do this, create larger size in Starwind or in ESXi?

Another issue I have is that in my Starwind Virtual SAN server I have 4 x 1Gb NICs and in my ESXi server I also have 4 x 1Gb NICs and its all connected to a Cisco SG300-28 switch (All NICs are connected at 1Gb). I have setup iSCSI port binding in ESXi and set Round Robin on the two datastores I have setup. On the SAN side I ensured that the MPIO role was installed AND I ran the MPIO tool and ticked the "Add support for iSCSIdevices" and rebooted the server. In vCenter it is showing that IO is active for all paths to each device. Despite all this, I can only achieve about 80MB/s when doing a vMotion from datastore1 to datastore2 (each datastore is a 480GB SSD). When I watch the vMotion happen with esxtop I can see traffic go across all 4 of the vmnics BUT it only adds up to (in total) 1Gb of bandwidth. Is there something further I need to do to get more than 1Gb of speed? I would have thought that iSCSI and MPIO with 4 x 1Gb NICs along with two SSD drives would have resulted in at least 300MB/s? On the SAN server I have run various benchmarking tools and they all give at least 450-500MB/s. I can provide more info if I have left anything out. All the network por bindings in vCenter for iSCSI are showing as compliant (and active). I currently have 16 paths for each datastore. I also tried setting the Round Robin IOPS setting to 1 (default was 1000) but this didn't help. Can MPIO give speeds higher than 1Gb even if you have 4 NICs?

What am I missing here?

Thanks!
xy67
Posts: 27
Joined: Tue Jan 05, 2016 9:21 pm

Mon Oct 31, 2016 11:20 pm

So clearly I am misunderstanding something or have misconfigured something so maybe someone can help me get this right in my head :D

I created a Windows VM and ran CrystalMark in it and got about 120MB/s. When I checked esxtop on the ESXi6 server the four 1Gb NICs were never using more than 1Gb combined. Not sure what happened ( a reboot I think) but now when I run CrystalMark I get about 450MB/s read/write speeds and I can set the four 1Gb NICs in esxtop spike up to about 800mb/s each.

So I *think* MPIO is working with iSCSI and setting Round Robin on my two test datastores?

So then I tried a further test and that was to copy a 3GB zip file between different folders in the Windows VM. Again, it never went about 120MB/s. Surely if CrystalMark is showing 450MB/s read/write speeds then when I copy a single large file of 3GB I should see similar speeds as CrystalMark? I'm just confused why I seem to be hitting the speed of a single 1Gb NIC?

My two LSFS devices have 4GB cache (RAM) and I have created a 20GB flash cache on an NVMe SSD drive.

The other odd thing I find is that no matter what I am doing in my test VMs (benchmarking, copying large files etc), if I monitor disk performance in task manager on the Starwinds Virtual SAN server it never seems to reflect the speeds of whats going on in the VM? ie: if I run CrystalMark and it shows a result of 450MB/s task manager never reflects this?

So correct me if I am wrong here but if I have four 1Gb NICs in my ESXi server AND I have another four 1Gb NICs in my Starwinds Virtual SAN server then surely my VM(s) should perform (roughly) at the speed of the SSD then (about 400MB/s)? If I test the storage locally on the SAN server then ALL disks get 400-500MB/s. Its only the iSCSI performance which is not consistent or workling as expected. I've insalled the Windows Server MPIO role and enabled MPIO for iSCSI devices in MPIO and configured Round Robin on both datastores. I'm using network port bonding for iSCSI and using a single subnet for iSCSI storage. All iSCSI networking is in its own VLAN on the switch. I have Starwinds Virtual SAN installed on Windows Server 2016 (patched as of today). I've also changed the IOPS value from 1000 to 1 for each datastore for Round Robin to improve performance.

Despite all this my iSCSI performance still seems sluggish/random and not performing as I'd like but I just can't pin point the issue. I'm using Samsung SM863 Enterprise SSD drives and all drives have the latest firmware and quad HP nc364t network cards in each server.

I also find that VAAI doesn't seem to be working correctly as when I clone or storage vmotion a VM between two datastores on the same server the traffic goes over the iSCSI NICs instead of staying local on the SAN. All datastores are reporting that hardware acceleration is supported.

So I am doing something wrong here? is this a problem with my configuration (or my knowledge!), a problem with Windows Server 2016 RTM or something else? This is my first real SAN/VMware setup with shared storage so I would really appreciate some help. I've read so much online and various whitepapers but I guess I have missed something!

Thanks for reading and for any help! :D Am happy to provide any further info if it'll help (screenshots, config etc)
xy67
Posts: 27
Joined: Tue Jan 05, 2016 9:21 pm

Wed Nov 02, 2016 4:32 pm

Can anyone offer any guideance or assitance please?
Michael (staff)
Staff
Posts: 319
Joined: Thu Jul 21, 2016 10:16 am

Wed Nov 02, 2016 5:15 pm

Hello xy67

The Flash cache has been designed for read-intensive environments (where reads more than 90%). It is recommended to set its size as 10% to StarWind device size. For example, for 1TB device, it should be 100 GB. So, it is up to your Production requirements.

There are no issues with deduplication here. Since you have created LSFS device (150 GB), ESXi will see 150 GB. At the same time because of deduplication, it could be a situation when it will take less than 150 GB on its storage. Also, you can create 450 GB StarWind device and ESXi will see 450 GB as well. But, anyway, you should leave the free space for LSFS snapshots storing.

As for the network configuration, could please clarify which channels you are using for vMotion and Management? When you are trying to copy something to VM or move it, it is using the Management or vMotion channel, thus it could be a bottleneck. I would recommend you to set up dedicated links only for iSCSI traffic and discover them in ESXi host. In case if it would be 3 x 1 Gbps, it should give about 3 Gbps with MPIO.

Meanwhile, have you tried to test the StarWind device performance on StarWind server? Just discover it via 127.0.0.1, connect the target in Microsoft iSCSI Initiator and test the disk. Please compare it with other results obtained with DISKSPD utility because CrystalMark sometimes shows not correct results: https://gallery.technet.microsoft.com/D ... e-6cd2f223
xy67
Posts: 27
Joined: Tue Jan 05, 2016 9:21 pm

Wed Nov 02, 2016 8:31 pm

Hi Micahel, really appreciate the reply!

I was about to post a long reply but I *think* I may have found the problem. Its still early days but heres the scoop:

I connected the four 1Gb NICs in my ESXi host DIRECTLY into the four 1Gb NICs on the SAN and here are the results:

ESXi server connected to Cisco SG300-28 switch:

Image

Running cables directly between the ESXi server and SAN:

Image

General file copies seem to be much quicker too (200MB/s or more).

Has anyone else had issues with iSCSI traffic when using the Cisco SG300-28 switch? I'm not using jumbo frames and the switch is running the latest firmware (May 2016). I'm just confused that the performance is so bad when using this switch for the storage traffic!

Is there perhaps some setting I have missed when configuring the switch that could affect iSCSI traffic?
xy67
Posts: 27
Joined: Tue Jan 05, 2016 9:21 pm

Thu Nov 03, 2016 4:07 pm

Hi MIchael
The Flash cache has been designed for read-intensive environments (where reads more than 90%). It is recommended to set its size as 10% to StarWind device size. For example, for 1TB device, it should be 100 GB. So, it is up to your Production requirements.

There are no issues with deduplication here. Since you have created LSFS device (150 GB), ESXi will see 150 GB. At the same time because of deduplication, it could be a situation when it will take less than 150 GB on its storage. Also, you can create 450 GB StarWind device and ESXi will see 450 GB as well. But, anyway, you should leave the free space for LSFS snapshots storing.
Can I just clarify the LSFS sizes as I am a bit confused about this. If my formatted SSD is 450GB, and I create a 150GB LSFS then ESXi thinks I have 150GB datastore. But if I create 3 x 50GB VMs on that datastore won't it be full since ESXi doesn't know dedupe is happening on the SAN?

Also, do you need to have the MPIO role installed in Windows on the server you install Virtual SAN on?

Is there a reason that when I create an LSFS device on SSD/flash storage that it sometimes shows as HDD storage and then sometimes it shows as a Flash storage?
Post Reply