Re: [Hampshire] Open source network backup with de-dupe.

Author: James Courtier-Dutton
Date:
To: Hampshire LUG Discussion List
Subject: Re: [Hampshire] Open source network backup with de-dupe.

On 15 July 2010 15:14, Keith Edmunds <kae@???> wrote:
> Hi James
>
> You're being unrealistic.
>
>> The documentation gives no explanation of what WAN bandwidth it will use.
>
> How can it? It depends on how much data you backup; more accurately, it
> depends on how much data has changed since the last backup.
>
>> It reads as if it gets all the data into a central location, and then
>> de-dupes it.
>
> It does.
>
>> This is not good for WAN bandwidth at all. If the same file is on two
>> computers, I only want one computer to send the file once.
>
> Explain how the server can ascertain that the data is the same on both
> clients without getting a full copy of the data. Note: not ascertain that
> it may be the same, but that it IS the same.
>

Take 1 central site PC called "A"
Take two remote sites PC called "B" and "C".

B has already sent a full backup to A.
C wishes to send a full backup to A, but lots of the data on C is the same as B.
C generates HASHs of its files, and only sends the HASHs to A.
A responses to C saying which HASHs it has not already got from B.
C then only sends a subset of the data, I.e. data that was not already
sent from B.

Thus, as lot of WAN bandwidth is saved.

There is also the possibility of doing this on a site bases. So, one
machine at the site de-dupes all the data for that site, and then just
sends the de-duped data over the WAN link.

Another view of this is can be:
When sending files from C to A, A compares the hashes sent by C with
its entire file store, and not just the single file that C is sending.

Kind Regards

James

This message is part of the following thread:
	the complete thread tree sorted by date
	Keith Edmunds at
	Chris Dennis at