Most efficient ways to download
by cyberorg,
Saturday, April 11th @ 9:37 am
..or download on steroids and how to update gigabytes of isos without downloading whole of it again and again.
In India good internet connection is quite expensive, to download 2GB iso it takes me about 15 hours. Here is what I do to optimize the download speed available.
Use metalinks:
Get the “metalink” from the download repository, for example openSUSE-Edu repo.
aria2c http://download.opensuse.org/repositories/Education/images/iso/$IMAGE_NAME.iso.metalink
This will use multiple mirrors to download the iso. Replace $IMAGE_NAME to actual image.
If you click on “mirrors” on any openSUSE repository, you will see nice little tip:
Hint: For larger downloads, a Metalink client is best — easier, more reliable, self healing downloads.
aria2c is a CLI metalink client, download from network:utilities repo. You can configure firefox to use aria2c to download files too.
Use Rsync:
Now what happens in open source world is things improve/update by the time download of huge files complete here. Here is how downloaded iso images can be updated without redownloading new one.
Use your favorite mirror that provides rsync, see the list of mirros providing rsync connection.
Check the availability of the image you want on the mirror by running:
rsync rsync://mirror.leaseweb.com/opensuse/repositories/Education/images/iso/
Copy old image with exactly same name as new image available:
cp oldimage.iso exact-name-new-image-is.iso
Run rsync again to patch it:
rsync -avP rsync://mirror.leaseweb.com/opensuse/repositories/Education/images/iso/exact-name-new-image-is.iso .
Dot at the end with space before it is part of the command.
This will download only the bytes that have changed, which in some cases is just few MBs, saving few GBs of download.
: roberto mannai, April 11 @ 2:04 pm
Hi cyberorg,
the best way to download incrementally only the diff of a binary file, for my best knowledge, is using the GDIFF protocol, who was submitted ten years ago to the W3C consortium: http://www.w3.org/TR/NOTE-gdiff-19970901
I know for sure that a commercial product of Configuration Management (Marimba, now buyed by BMC - see http://www.marimba.com) use it, implemented in Java: it is very useful in low bandwidth nets, when downloading a service pack, for example. I don’t know if one person could use that Java algorithm implementation, anyway, being a commercial application.
Other implementations are in PERL and RUBY:
http://search.cpan.org/~geoffr/Algorithm-GDiffDelta-0.01/GDiffDelta.pm
http://webscripts.softpedia.com/script/Development-Scripts-js/gdiff-gpatch-18695.html
I cannot understand why that algorithm is not widely used, given its quality; it shoud be useful if it was available when downloading large files like ISOs or VM images! Maybe on openfate you could add a request !!
: robermann79, April 12 @ 9:03 am
An open source .NET (C#) implementation:
http://gdiff.codeplex.com/
with MPL license