DeDupe Volume Settings

NetWise · Sat Jul 07, 2012 4:08 am

I was wondering if there is a recommendation for DeDupe based volumes? I ran into an issue where testing on a 408GB disk, and created a 350GB StarWind DeDupe volume. I was hoping that it would use only up to 350GB and no more, so I'd have ~60GB free just in case. I see now that there is an .SPBITMAP, .SPDATA, and an .SPMETADATA, as well as a .DDDB, and I ended up with a drive full issue in Windows, which took my target offline. Test environment so no big deal, but I want to get a handle on what I might have done wrong or with incorrect assumptions before I retry.

I see from other posts and from the performance whitepaper, that 4K DeDupe works better than 64K DeDupe. This of course makes perfect sense, as it is far more likely to find identical 4K blocks than to hope to find blocks 16x the size without a single byte difference. It also makes sense that the block size should be the same throughout, to avoid a situation similar to MBR mis-aligning. So I would want 4KB blocks on the Windows 2008R2 Starwind host, and on my ESXi 5 boxes I would want 1MB VMFS 5 datastores, and inside my VM's, I would want 4KB blocks, correct?

My biggest concern is ensuring that the Starwind host volume doesn't fill to capacity and take the volume(s) offline.

Also in the performance white paper there is some indication that there is a way to measure the DeDupe size and ratio. Is this done by simply looking at the size of one of the 4 volumes above or is there a tool within the StarWind console I haven't found to tell me this? It looks like if you delete blocks (ie: a VM) it does not reclaim the space inside the VMFS LUN (or inside of StarWind), which is typical of thin provisioned disks and I understand that. But this would skew DeDupe calculations if one is simply looking at the high-water mark of the files on the StarWind host volume.

Just looking to make sure I'm doing this right, before I jump both feet into it.

Thanks!

Mon Jul 09, 2012 12:48 pm

I was wondering if there is a recommendation for DeDupe based volumes? I ran into an issue where testing on a 408GB disk, and created a 350GB StarWind DeDupe volume. I was hoping that it would use only up to 350GB and no more, so I'd have ~60GB free just in case. I see now that there is an .SPBITMAP, .SPDATA, and an .SPMETADATA, as well as a .DDDB, and I ended up with a drive full issue in Windows, which took my target offline. Test environment so no big deal, but I want to get a handle on what I might have done wrong or with incorrect assumptions before I retry.

I see from other posts and from the performance whitepaper, that 4K DeDupe works better than 64K DeDupe. This of course makes perfect sense, as it is far more likely to find identical 4K blocks than to hope to find blocks 16x the size without a single byte difference. It also makes sense that the block size should be the same throughout, to avoid a situation similar to MBR mis-aligning. So I would want 4KB blocks on the Windows 2008R2 Starwind host, and on my ESXi 5 boxes I would want 1MB VMFS 5 datastores, and inside my VM's, I would want 4KB blocks, correct?

My biggest concern is ensuring that the Starwind host volume doesn't fill to capacity and take the volume(s) offline.

Well, we are working on in currently - the feature that'll allow to monitor space that is keft on HDD where the DD files are stored should be available in next release.

Also in the performance white paper there is some indication that there is a way to measure the DeDupe size and ratio. Is this done by simply looking at the size of one of the 4 volumes above or is there a tool within the StarWind console I haven't found to tell me this? It looks like if you delete blocks (ie: a VM) it does not reclaim the space inside the VMFS LUN (or inside of StarWind), which is typical of thin provisioned disks and I understand that. But this would skew DeDupe calculations if one is simply looking at the high-water mark of the files on the StarWind host volume.

Yes, you are correct - we haven`t got blocks re-usage or deletion. But we will in the next major release.

As the conclustion - thank you for your cooperation and stay tuned

NetWise · Tue Jul 10, 2012 12:28 am

So looks like I would want to set up a Scheduled Task or something with a trigger of free space to start sending e-mails. Or, potentially, run a PowerShell script to shut down the VM's and the LUN/Target. For the moment, I can live with that. What it would be nice if it did is the "stunning" of VM's that some arrays do if they hit the wall. Just in case though, I've copied 15GB of files into a "FILLER" folder, so if I have an issue, I can always purge that folder and get some emergency space back. If the volume fills, does it corrupt anything, or does it gracefully stop somehow?

I was able to copy over 12 of my lab VM's, which - granted, are all based from the same template and fairly similar other than the Roles and Features they are running. 225.84GB of VM's (thin provisioned), 73.0GB on disk on the StarWind host.

I really wish I hadn't gone 4x120GB SSD in RAID5, and just gotten a single 512GB SSD. Then I could use Veeam to replicate it to something slow and cheap, just in case the SSD ever died.

So far, the performance and DeDupe ratio have been good. I tried to setup another DeDupe target on the SAS disks in the same server (this is an 8x2.5" PE2950), and with 8GB it won't do it. I recall somewhere else in the forums a comment about 3.5GB RAM required for the DeDupe, add in 1024-2048MB for Write Back Caching..... and you can only have one DeDupe volume in 8GB. Off to eBay I go for some additional 2GB sticks

I'm still not sure I'm seeing > 1GbE bandwidth even if I have multiple NIC's on both the target and the ESXi host side. More settings to tweak, I'm sure I've missed something. I suspect Jumbo Frames will help, I'm just not there yet.

I'll keep you posted

Fri Jul 13, 2012 8:52 am

So looks like I would want to set up a Scheduled Task or something with a trigger of free space to start sending e-mails. Or, potentially, run a PowerShell script to shut down the VM's and the LUN/Target. For the moment, I can live with that. What it would be nice if it did is the "stunning" of VM's that some arrays do if they hit the wall. Just in case though, I've copied 15GB of files into a "FILLER" folder, so if I have an issue, I can always purge that folder and get some emergency space back. If the volume fills, does it corrupt anything, or does it gracefully stop somehow?

The service wont stop, so I`m afraid that the last written data to the DD on the overfilled HDD/volume will be corrupted, so it`ll be better for now to monitor the free space on the hard drive.

I was able to copy over 12 of my lab VM's, which - granted, are all based from the same template and fairly similar other than the Roles and Features they are running. 225.84GB of VM's (thin provisioned), 73.0GB on disk on the StarWind host.

So the DD ratio is 3. Not bad, but could be better I think. What the stripe size, NTFS block and DD block sizes were used please? And all the VMs were stored on single DD device, correct?

So far, the performance and DeDupe ratio have been good. I tried to setup another DeDupe target on the SAS disks in the same server (this is an 8x2.5" PE2950), and with 8GB it won't do it. I recall somewhere else in the forums a comment about 3.5GB RAM required for the DeDupe, add in 1024-2048MB for Write Back Caching..... and you can only have one DeDupe volume in 8GB. Off to eBay I go for some additional 2GB sticks

Yes, absolutelly, 3.5 MBs of RAM per 1 Gig with 4k block size of DD. This value should be decreased in next version to 2 MBs.

I'm still not sure I'm seeing > 1GbE bandwidth even if I have multiple NIC's on both the target and the ESXi host side. More settings to tweak, I'm sure I've missed something. I suspect Jumbo Frames will help, I'm just not there yet.

You should also take into account the CPU and RAM utilization - theese are the major factors too.

I'll keep you posted

Great! Thanks very much!