BackupAssist Single Instance Store confusion

Single Instance StoreWe had an interesting support call the other day that I thought might be useful to share.

The customer who called was using BackupAssist on multiple sites and using rsync jobs to backup from a number of Windows machines, to a central one running CWRsync. Everything was working well but there was some confusion as to how much space each job was taking up on the rsync server.

The jobs were configured in the same way using the ‘basic’ schedule, storing backups in daily folders and naming them Monday, Tuesday, Wednesday and so on. On the rsync server each remote server stored backups in its own dedicated folder, with the daily sub folders within those. What the customer wanted to know was exactly how much space they should expect each individual job to be using.

Normally this would be a simple one to answer, you’d just navigate to the server folder in Windows Explorer, right click and choose properties to see how much data the folder contained. There is scope for confusion in this scenario however, as BackupAssist uses Single Instance Store for these jobs, which means most of the files in the backup folders are not actual copies of files but instead file pointers to a ‘single Instance’ stored in a hidden folder.

For example in our test backup, we simulated a backup job of 310MB running for five days, only 50 bytes of data changed each day but the backup folders all appeared in Windows to be around 310MB. Five folders equals a total disk storage of 1.5GB, or so Windows thinks. In reality, the disk space used on the hard drive is actually around 311MB. Most of the files are duplicated and so are not stored on the disk more than once.

Unfortunately it can be difficult to prove this as Windows always reports what space the files represents, even the ‘size on disk’ value counts up the files more than once and reports the total space used.

On a fresh hard drive with no other data you can actually see the true disk usage value by viewing the drive properties but of course in many examples the backup is in amongst more data and impossible to distinguish.

So then – a bit of a challenge for you!

Do you know of a Windows software tool that can show the actual number of disk clusters a folder uses on a disk rather than just counting up the usage for all the files pointed to within the folder? If you do please let me know in the comments below as it’d be great to be able to quickly and easily highlight the savings that are made by Single Instance Storage.

6 thoughts on “BackupAssist Single Instance Store confusion

  1. Pingback: Some BackupAssist tips from Zen Software | BackupAssist: The Windows Backup Software

  2. BackupAssist’s single instance store uses NTFS hard links. Sysinternal’s command line utility “du” (which is similar to the Unix utility of the same name) can correctly show disk usage, taking account of hard links, if you use the “-u” option. You can get it from http://technet.microsoft.com/en-us/sysinternals/bb896651.

    BackupAssist’s media usage report section also correctly shows disk usage. We are looking into the possibility of splitting its functionality out into a stand-alone du-like command line program which may be included with a future release of BackupAssist.

    • Thanks David,
      Yes I should have thought of that I use the ‘du’ command on my Linux NAS box to get the report. If you can separate out this function it would make a very usfull report. The main issue we see is when multiple BackupAssist Rsync jobs from multiple sites share a single destination and collating all the media usage reports becomes a bit messy. Ideally you want to just have a running report that you can run on the Rsync server that shows the correct disk usage per Rsync destination folder.

  3. I sometimes use a command line program for this. The program was written in German and is called ctTrueSize. You can find it at: http://www.heise.de/software/download/cttruesize/50272 .

    I don’t speak or read German so I had the command line “help” screen translated for me and was told it has the following translation:

    Ct True Size, Volume 1.1—denotes the true size of the folders. Copyright © 2007, 2008, “C’t Magazine for Computer Technology,” written by Hajo Schulz.
    (-n) without recursion
    (q) do not keep record of access mistakes
    (x) break off when access mistakes occur
    (-s n) make note of “between totals” (?) to the depth of n
    (-j) keep record of junctions (reparse points)
    (-1) keep record of multiple numerated files
    (-1a) keep record of all hard links
    (-c) keep record of compromised files
    (-p) keep record of sparse files
    (-a) keep record of alternate data streams files
    (Path) of the folder(s) to be examined. If none is indicated, the current record will be examined.
    -? –h notification of these references.

    Also, here’s an example of the output from program translated into English:

    Gross:
    50728 Files, 8541 folders
    12,9 GB
    Hardlinks:
    7993 files
    2,20 GB
    Condensed:
    6 files
    Uncondensed: 18,4 MB
    Condensed: 8,78 MB
    Medium compression rates: 52.5%
    Alternate Data Streams:
    2 streams in 1 file und 0 folders
    5,25 KB
    Total:
    42737 files and streams, 8541 folders
    10,7 GB
    Inclusive Cluster-Blend (?): 10,8 GB

Let us know what you think....

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s