Re: [Hampshire] Open source network backup with de-dupe.

Top Page

Reply to this message
Author: Chris Dennis
Date:  
To: Hampshire LUG Discussion List
CC: James Courtier-Dutton
Subject: Re: [Hampshire] Open source network backup with de-dupe.
On 15/07/10 15:39, James Courtier-Dutton wrote:
>
> Take 1 central site PC called "A"
> Take two remote sites PC called "B" and "C".
>
> B has already sent a full backup to A.
> C wishes to send a full backup to A, but lots of the data on C is the same as B.
> C generates HASHs of its files, and only sends the HASHs to A.
> A responses to C saying which HASHs it has not already got from B.
> C then only sends a subset of the data, I.e. data that was not already
> sent from B.
>
> Thus, as lot of WAN bandwidth is saved.


The problem is that hash collisions can occur. Two files with the same
hash are /probably/ the same file, but probably isn't good enough -- a
backup system has to 100% sure. And the only way to be certain is to
get both files and compare them byte by byte.

BackupPC uses hashes for file names, but also checks for hash collisions
and deals with them when they happen.

>
> There is also the possibility of doing this on a site bases. So, one
> machine at the site de-dupes all the data for that site, and then just
> sends the de-duped data over the WAN link.


That could work, but needs more software at the client end. Does rsync
do anything like that?

cheers

Chris
-- 
Chris Dennis                                  cgdennis@???
Fordingbridge, Hampshire, UK