Evaluate Deduplication- Free Version

Software-based VM-centric and flash-friendly VM storage + free version

Moderators: anton (staff), art (staff), Max (staff), Anatoly (staff)

Asanam
Posts: 9
Joined: Wed May 25, 2011 1:48 am

Wed May 25, 2011 1:58 am

We are testing the effectiveness of deduplication in the following manner.

Having setup a target image on Windows 2003 we have iniated the connection to this disk on a Windows 7 client.

The disk size is 5GB. A folder with 'x' files is copied 'y' times which are in turn put into a folder which is again copied 'z' times. In all, the disk properties show 2.54 GB for 2208 files.

Having tried this with an un-deduplicated disk the sizes remain the same.

So, how does one evaluate the effectiveness of deduplication?

Regards,

Asanam
User avatar
anton (staff)
Site Admin
Posts: 4021
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Wed May 25, 2011 8:47 am

Deduplication saves space on your HOST not on layered above it file system (How you do think it should work in general?). So for you it's still multiplication of sizes you copy and for HOST is less space used and allocated to keep your real data.

How large is your .data or .spdata file StarWind maintains for your deduplication volume? Deduplication should be calculated ( your files size ) / size of ( .data or .spdata ). In V5.7 we'll report both deduplication and compression ratios in GUI.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
Asanam
Posts: 9
Joined: Wed May 25, 2011 1:48 am

Wed May 25, 2011 3:43 pm

Thanks for the prompt response. Look forward to 5.7 but in the mean time I've done another test, this time with NTbackup files.

File1 is a backup of a folder, File2 is another backup of the same content as File1 with the addition of an outlook.pst. These are both then copied onto the drive.

This is what I see:
---------------------------Client bytes----------------Host spdata-----------Ratio
File1---------------------10,718,438,400-----------15,971,287,040-------0.671
File2---------------------12,032,265,216
File 1and 2--------------22,750,703,616-----------33,864,122,368------0.672
File 1and 2 x 2---------45,501,407,232-----------66,962,128,896------0.680

Still looking for that elusive revelation of data deduplication. :roll:

Regards

Asanam
User avatar
anton (staff)
Site Admin
Posts: 4021
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Wed May 25, 2011 3:58 pm

StarWind cannot create deduplicated content larger then source non-deduplicated. Can you provide some sort of the screenshot of directory listing of your mapped target and host holding spdata file?
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
Asanam
Posts: 9
Joined: Wed May 25, 2011 1:48 am

Wed May 25, 2011 4:24 pm

I certainly can supply these:

Attached:

1. Directory listing of mapped target and
2. Host holding the spdata

Regards

Asanam
Attachments
Directory listing of mapped target
Directory listing of mapped target
Q_Dir.JPG (104.37 KiB) Viewed 14979 times
Host holding the spdata
Host holding the spdata
iSCSi.JPG (82.57 KiB) Viewed 14977 times
User avatar
anton (staff)
Site Admin
Posts: 4021
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Wed May 25, 2011 4:40 pm

Did you re-create the target for your tests? For now all deletes are DISABLED so deduplicated "knowledge base" is never emptied. We'll change this (provide an option) before release but for now you need to have fresh created volume (never writtened) to verify what's with your data. FYI.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
Asanam
Posts: 9
Joined: Wed May 25, 2011 1:48 am

Wed May 25, 2011 5:06 pm

This is a fresh target on another disk altogether. It's an external USB drive, wiped, reinitialized etc and new target altogether. Oh, and it's on another server also, SBS2003R2. Everything seems sooth and fine but the numbers certainly do not stack up.

Can you recommend a way of evaluating the dedup feature which would help in planning the sizing?

Regards

Asanam
User avatar
anton (staff)
Site Admin
Posts: 4021
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Wed May 25, 2011 5:10 pm

I don't understand how you'd managed to have such a result. Can you delete whole volume set, re-create the target and provide details after each step? Something like:

1) Step one. Zero-sized target. Size of .spdata is ...

2) Step two. First directory is copied. Size of .spdata is ...

3) Step three. Second directory is copied. Size of .spdata is ...

We're adding detailed stats now but it would not change anything for you - only something you do "with hands" would be shown in GUI.
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
ypae
Posts: 1
Joined: Sat Apr 21, 2012 11:31 pm

Sat Apr 21, 2012 11:39 pm

anton (staff) wrote:Did you re-create the target for your tests? For now all deletes are DISABLED so deduplicated "knowledge base" is never emptied. We'll change this (provide an option) before release but for now you need to have fresh created volume (never writtened) to verify what's with your data. FYI.
Hello,

I have 2 questions:

1. If I have bunch of .VHDs with so many duplicate files insdie on StarWind iSCSI, would "Block Level" deduplication work effectively still?

2. When "empty orphaned deduplication knowledge base" option would be available?

Thanks,

Young-
User avatar
anton (staff)
Site Admin
Posts: 4021
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Sun Apr 22, 2012 12:05 am

1) Yes of course. That was a plan.
2) V5.9 (you may apply for beta right now)
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
dataanywhere
Posts: 1
Joined: Tue May 08, 2012 9:31 pm

Tue May 08, 2012 9:51 pm

Did anyone figure this issue out?

We're seeing similar results while evaluating using the free version.

A folder copied raw is just over 14GB in size when viewing from the client. Making a second copy of that folder, the .spdata file on the host is double the size.

Configuration is "new" as we just set it all up for the first time. The host we're using for testing is a Windows 7 Pro Virtual Machine within VMWare Workstation 8. The disk is a thin-provisioned 2TB vmdk. Within the VM host, we quick-formatted the drive NTFS. Then setup the deduplication virtual disk device, selected 2 TB in size. See screenshot attached for complete deduplication settings.

The client that connects to the iSCSI target is a Windows SBS 2008 server. The iscsi connection worked flawlessly, we formatted the disk (NTFS quick format), then copied the c:\windows folder into a sub directory on that disk. That equated to just over 15GB .spdata file. Copying that same folder onto the same disk (thus creating "Copy of Windows") folder results in the .spdata file growing to double the size.

We're really interested in this product.

Thanks for any input,
Geoff
Attachments
screenshot showing deduplicated device config
screenshot showing deduplicated device config
screenshot.jpg (149.39 KiB) Viewed 14264 times
User avatar
anton (staff)
Site Admin
Posts: 4021
Joined: Fri Jun 18, 2004 12:03 am
Location: British Virgin Islands
Contact:

Tue May 08, 2012 10:40 pm

Have an impression it's because of a deduplication block size of 64KB. Can you create smaller storage and use 4KB or 8KB one? Also support people will take a closer look tomorrow...
Regards,
Anton Kolomyeytsev

Chief Technology Officer & Chief Architect, StarWind Software

Image
User avatar
Bohdan (staff)
Staff
Posts: 435
Joined: Wed May 23, 2007 12:58 pm

Thu May 10, 2012 7:43 am

What is the cluster size (allocation unit size) of the NTFS "E" volume where the deduplicated virtual disk files are stored?
For dd block size 64K it should be also 64K.
caustix
Posts: 2
Joined: Mon Jun 04, 2012 8:34 pm

Mon Jun 04, 2012 9:19 pm

I experience this issue too with SPdata being way to big, even on the initial copy of files to it (nothing is being moved around) Could it be because of a cluster size mismatch between VMGuest/iSCSIFS/spdata

iSCSI native disk partition formatted (where spdata resides):
Bytes Per Cluster : 65536



VMware Guest:
Bytes Per Cluster : 4096

I am using version 5.8.2013
Host DDDisk chosen 64k block size.
Server 2008 R2 on Guest VM and Starwind server

I have noticed SPdata file being always larger than size of guest disk. - almost double in size since it was started. At first I thought this was just something weird with Server 2008 R2 but I can not figure it out.

Guest usage: 3321GB
SPdata size 4700GB
SPdata test formatted to 20TB but I fear host iSCSI SPData file will grow way too large if this persists.

There is no Pagefile on that guest's drive. I don't know why it is growing so big.
caustix
Posts: 2
Joined: Mon Jun 04, 2012 8:34 pm

Mon Jun 04, 2012 11:05 pm

I think matching cluster sizes solved the issue.
I created another DD Disk (64K, made large VMware 5 datastore, spanning a few 2TB VMDK files)
I formatted that guest parition using 64k and I am seeing some good results so far.
Post Reply