Fast sync of billions of files. Rsync: Powerful utility for fast, flexible remote and local file copying

18.08.2019 news

The need to transfer files between servers and computers occurs quite often, especially when administering multiple devices. Usually for these purposes it is convenient to use ssh and scp, but if the file is very large, and only a small part of it has been changed, or you want to set up constant automatic synchronization, then scp is clearly not an ideal option. There are special utilities for this. In this article we will take a look at one of them. Namely rsync synchronization in linux will be considered.

Rsync is open source software that can be used to synchronize files and folders from a local computer to a remote one and vice versa. A notable feature of Rsync is the ability to transfer encrypted files using SSH and SSL. In addition, the file transfer is performed in a single thread, unlike other similar programs, which creates a separate thread for each file transfer. This increases the speed and removes the additional delays that become a problem when transferring a large number of small files.

It is possible to use rsync to synchronize files, directories, and compression and encryption can be used. The program was first applied in June 1996 and was developed by Andrew Tridgel and Paul Mackerras. Rsync synchronization is performed using the RSYNC protocol, which is specially designed not just to transfer files between two computers, but to synchronize them. More precisely, it is not the complete file that is transferred, but only what has been changed.

As you already understood, in this article we will look at rsync synchronization examples, rsync configuration, as well as its main features and options.

Features of Rsync

Let's take a look at the notable features of Rsync first:

Ability to keep entire directory trees in sync
You can save symbolic links, hard links, file owners and permissions, metadata, and creation time.
Doesn't require special privileges
Single stream file transfer
Support for RSH, SSH as transport
Anonymous Rsync support

Rsync syntax

We will not dwell on the installation of this utility on the system in detail. It is very popular, so you can install it using your package manager from the official repositories. On Ubuntu, the installation command will look like this:

$ sudo apt-get install rsync

And now, according to the tradition of similar articles, consider the syntax of the rsync command:

$ rsync options source sink

The source and destination can be a remote or local directory. For example ssh, rsync, samba server or local directory. Options specify additional options for rsync.

Rsync options

Now let's take a quick look at the rsync options. Not all options are listed here. For more details see man rsync:

-v- Display detailed information about the copying process
-q- Minimum information
-c- Checking checksums for files
-a- Archiving mode
-R- relative paths
-b- creating a backup
-u- do not overwrite newer files
-l- copy symbolic links
-L- copy content of links
-H- copy hard links
-p- save rights for files
-g- save group
-t- save modification time
-x- work only on this file system
-e- use other transport
-z- compress files before transfer
—Delete- delete files that are not in the source
—Exclude- exclude files by pattern
—Recursive- iterate over directories recursively
—No-recursive- disable recursion
—Progress- display the progress of the file transfer
—Stat- show transmission statistics
—Version- utility version

Rsync Server Configuration

As you understand, you can't just take and drop files onto the first machine that comes across without installing special software on it. The remote machine must have an RSYNC, SSH, Samba or FTP server installed and configured, with which Rsync can log in to the machine and transfer files there.

Let's take a look at the minimal configuration of an rsync server so that rsync copying of files can be performed. It will allow us not only to sync files to the machine, but also to receive from there.

First, create a config file with the following content:

$ sudo nano /etc/rsyncd.conf

Path = / tmp / share / hosts allow = 192.168.1. * Hosts deny = * list = true uid = root gid = root read only = false

Here we set the path to our folder for synchronization, allow access to the server only from the home network (192.168.1. *) And deny all other connections. The uid and gid parameters specify the user and group from which the daemon will be started. It is better not to use root, but to specify the user nobody and give him the rights to the folder in which rsync directory synchronization will be performed.

$ sudo service rsyncd start

$ sudo service rsyncd enable

Rsync Synchronization Examples

Copy and sync files locally

Rsync allows you to sync files and folders within the same machine. Let's first look at using rsync to sync a file locally:

$ rsync -zvh file / tmp / backups /

Synchronizing folders on the local machine

Synchronizing rsync folders is as easy as syncing files:

$ rsync -avzh / home / user / documents / tmp / backups /

Synchronization with a remote server

It is not much more difficult to synchronize files with a remote server. Let's copy the local documents folder to the remote server:

$ rsync -avz documents / [email protected]: / home /

You can also synchronize files yourself with rsync from a remote server:

$ rsync -avz [email protected]: / home / documents /

The address of the remote server is written in the following format:

username @ machine_address: port / folder / on / remote_machine

It is not necessary to specify the port, then rsync folder synchronization will be performed on the standard port.

Synchronizing files over SSH

The -e option is used to set the connection protocol. When using SSH, all transmitted data is encrypted and transmitted over a secure channel, so that no one can intercept it.

To use SSH, you need to know the user's password on the system.

Synchronizing rsync files from a remote server via ssh will look like this:

$ rsync -avzhe ssh [email protected]: /root/install.log / tmp /

Now let's transfer the data to the same server:

$ rsync -avzhe ssh backup.tar [email protected]: / backups /

View sync progress

To view the progress of copying a file from one machine to another, use the progress option:

$ rsync -avzhe ssh --progress / home / user / documents [email protected]: / root / documents

Synchronizing not all files in rsync

The include and exclude options let you specify which files to sync and which to exclude. Options work not only with files but also with directories.

For example, let's copy all files starting with the letter R:

$ rsync -avze ssh --include "R *" --exclude "*" [email protected]: / root / documents / / root / documents

Delete on sync

During synchronization, you can delete files that are not on the machine from which rsync synchronization is coming from; for this, use the –delete option.

For example:

$ rsync -avz --delete [email protected]: / documents / / tmp / documents /

If, before executing this command, you create a file in the folder that is not on the remote server, it will be deleted.

Maximum file size

You can specify the maximum size of files to sync. The –max-size option is used for this. For example, we will only sync files less than 200 kilobytes:

$ rsync -avzhe ssh --max-size = "200k" / user / documents / [email protected]: / root / documents

Removing original files

It is possible to delete the source files after the synchronization with the remote server is complete:

$ rsync --remove-source-files -zvh backup.tar / tmp / backups /

Thus, the backup.tar file will be deleted after copying to the / tmp / backups folder is complete.

The wonderful and very popular rsync program has many handy options and extraordinary abilities, but they are not easy to find. Well, unless you're ready to read the entire manual from cover to cover.

I will try to save you some time reading the manual and tell you about the most important and useful things. In simple words, as it is.

Why is rsync needed?

Why use rsync if you have the usual cp and scp, you ask.

In one part, rsync is more convenient and better than the mentioned analogs because it gives you the opportunity to see everything that it will do when copying files before the actual copying operation.

In the other part - rsync by default copies only new and changed files, and even not only files - but their separate parts, doing it cruelly and efficiently.

These two reasons alone are enough to make you forget about cp in your daily work, replacing this command and its analogues for rsync.

Application principle

If we are talking about simple copying of files, then the first thing to do is always do a test run (switch -n) in verbose mode (-v):

rsync -avn source example.com:destination

In this mode, rsync will display a list of the files it will copy over. Only new and changed files will be copied. You can verify that the directory itself is being copied, not the content, or that it is the content that is copied.

After you have made sure that exactly what you want is copied, you can start the real copying:

rsync -av source example.com:destination

In this command, the -a switch implies recursive copying of all files and directories including their attributes such as creation date and modification date. The -v switch will give you a detailed report of the work as it runs and when it finishes.

Directory copying rules

On the one hand, the rules are very simple.

If there is no slash at the end of the path to the named source, then the directory itself will be copied.

$ rsync -avn path / to / source example.com:destination sending incremental file list source / source / example.html ...

If there is a slash, or the source points to a directory without a name, then the contents of the directory will be copied.

$ rsync -avn path / to / source / example.com:destination ^^^ sending incremental file list example.html ... # Equivalent to this command: $ cd path / to / source; rsync -avn. example.com:destination

On the other hand, it is quite possible to forget in the heat of what and how, having copied the contents of the directory instead of the directory itself, having gotten a lot of trouble with deleting unnecessary files that appeared out of nowhere with dates in the past (the -a key, remember?).

Therefore, it is best to always follow the usual scenario with preliminary verification of the operation.

Some useful keys

First, let's talk about the options that will be great for you to know without spurs and cheat sheets.

The -P switch includes several options at once. With this key, rsync will show the progress of copying individual files, and continue copying if it was interrupted. This option is especially useful when copying large files. The -P switch must be specified every time, otherwise rsync will delete files that have not been fully downloaded.

If you are copying files from a very busy or weak server, you can eliminate the wasting of processor resources on calculating the changed parts of the files by copying them entirely. This requires the -W switch.

If you want to know how much rsync thinks is left to work, you need the --info = progress2 switch. If you are copying an entire file system, then this key, used by itself, will disappoint you: the information about the total volume will be constantly updated. This is because rsync does not try to read the entire filesystem before it starts copying, but does both tasks at once.

But don't despair! If you want to know exactly how much work is left from the very beginning, then you can disable sequential scanning with the --no-inc-recursive switch or, in short, --no-i-r.

$ rsync -ah --partial --info = progress2 --no-i-r source example.com:destination 623.38M 0% 82.23MB / s 0:11:10

The keys above are available since version 3.1.0, that is, they already work in Debian stable.

If you need not just copy files, but completely synchronize the contents of directories by deleting unnecessary files, and for some reason you cannot synchronize files using Git, then the --delete switch (or its equivalent --del) will come in handy.

With this switch rsync will remove unnecessary files from the destination directory.

$ rsync -avn --delete source example.com:destination sending incremental file list deleting source / bad.txt source / source / test.txt

The -n switch in the command above was left on purpose.

Let's say a word about compression

Contrary to popular misconception, using compression inside rsync (-z switch) is more harm than good. The fact is that OpenSSH, which has been used everywhere since the end of 2005, uses compression of transmitted data by default. You understand that compression of already compressed data only uses processor resources without reducing the amount of data transferred.

You can verify that compression is already being used when connecting to your server:

$ ssh -v [email protected] false 2> & 1 | grep compression debug1: Enabling compression at level 6.

If this command will not show, among others, such a line about enabling compression as above, then may be you should use compression. It is worth checking that compression is good. This is especially true for low-power devices with fast connections: it may be faster for your NAS to copy something over a gigabit link without compression than trying to overtake a gigabit network with a low-power processor.

Fortunately, rsync is smart enough not to use compression if you copy files locally, from directory to disk, etc.

Copying partially

You will probably need rsync to skip some files when copying someday.

In the simplest case, you want rsync not to copy files from different version control systems, including directories like .svn and.git. For this task, you do not need anything other than the -C switch (or --cvs-exclude in full form). This will ignore most popular VCS files as if they were not there. Remember to use -n the first time you run it.

rsync -nC example.com:source destination

It can happen that you, by mistake, copy a bunch of such files from VCS. In this case, to get a clean copy, you will need the --delete-excluded switch, with which all excluded files will be deleted.

rsync -nC --delete-excluded example.com:source destination

We exclude via .rsync-filter

If you need more flexible rules, which is especially important if copying is done regularly, then it is better not to waste time on trifles and draw up all exceptions in the rsync-filter file.

$ cat source / .rsync-filter - test.bin - * .tmp - /.cache - / example / - / ** / Trash / - /.mozilla/firefox/*/Cache/ + Projects / ** / Trash /

To exclude something from the list for transfer, add a line with the rule (- or + at the beginning of the line) to this file.

If you need to exclude a specific file, wherever it may be in any directory below the hierarchy, then we simply indicate the file name.

# no test.bin file will be copied - test.bin # all .tmp files will be skipped - * .tmp

If you need to exclude a file or directory relatively rsync-filter directory, then we indicate with a slash at the beginning:

# the directory or .cache file will not be copied, but foo / .cache and foo / bar / .cache will be copied - /.cache # the example directory will not be copied, but the example file will be copied - / example /

In the rules, an asterisk matches any character except a slash, and two asterisks match any character at all:

# the directory will be skipped .local / share / Trash / and Documents / example / Trash / - / ** / Trash / # the directory will not be skipped .mozilla / firefox / abcd.profile / ext / Cache / # but the directory will be skipped .mozilla /firefox/abcd.profile/Cache/ - /.mozilla/firefox/*/Cache/

Finally, if you want some files to be copied, despite the previously set rules, then they can be marked with the + rule at the beginning of the line.

# directory Projects / Example / layout / Trash / will be copied + Projects / ** / Trash /

Rsync-filter files rsync can search the entire directory structure when run with the -F option.

If you want these files themselves not to be copied, then you need to specify this key two times like this:

$ rsync -avFFn source example.com:destination sending incremental file list source / source / example.html source / tmp / source / tmp / foo.bin sent 174 bytes received 30 bytes 408.00 bytes / sec total size is 18,400 speedup is 90.20 ( DRY RUN)

As you can see, the extra files were not copied:

$ ls source / .rsync-filter source / foo.tmp source / foo.tmp source / .rsync-filter $ cat source / .rsync-filter - * .tmp

Limiting rsync over ssh

Sometimes you need to enable rsync over ssh, remotely and without a password, only defined for the directory and host, excluding copying anything to other places or from other places.

For example, you want to be able to copy files to the server backup.example.com only from the host server.example.com, only and only to the backup-example directory, and only with these options:

$ rsync -aW --del source / backup.example.com:destination/backup-example/

Then you first need to get the command that rsync runs when invoking ssh on the remote host:

$ rsync -e "ssh -t -v" -aW --del source / backup.example.com:destination/backup-example/ 2> & 1 | grep command debug1: Sending command: rsync --server -lWogDtpre.iLsfxC --delete-during. destination / backup-example /

Accordingly, in ~ / .ssh / authorized_keys on example.com, add the launch of this command by default for the known ssh key when connecting:

from = "server.example.com", command = "rsync --server -lWogDtpre.iLsfxC --delete-during. destination / backup-example /", no-pty, no-port-forwarding ssh-rsa AAAA .. . # next is your key

Thus, even if some other options are specified when starting rsync, the target server will still execute the rsync command corresponding to the original options and settings you specified.

If you want your backup not to be overwritten or deleted on the destination server, then the --del option should be replaced with --ignore-existing.

Time Machine

Those macOS and OS X users who make a backup will surely appreciate the work of Time Machine. This program allows literally in two clicks to return to the previous version of any file. For all its beauty, Time Machine doesn't do anything that we can't do with rsync.

#! / bin / bash set -o nounset -o errexit cd $ (dirname "$ 0") date = $ (date --iso-8601 = seconds) test -L latest || ln -s "$ date" latest rsync --delete-excluded --prune-empty-dirs --archive -F --link-dest = ../latest " [email protected]"" ./ $ date "rm latest ln -s" $ date "latest

The script should be placed in the root of the disk or directory where the backups should be made.

Run with only one argument: source directory. For example, like this.

/ mnt / backups / backup / home

After several runs, we get the following directory structure:

2017-02-08T22: 05: 04 + 09: 00 2017-02-08T22: 10: 05 + 09: 00 2017-02-08T22: 15: 05 + 09: 00 2017-02-08T22: 20: 06 + 09 : 00 2017-02-08T22: 25: 05 + 09: 00 2017-02-08T22: 30: 04 + 09: 00 latest -> 2017-02-08T22: 30: 04 + 09: 00

In this case, latest indicates the most recent backup.

Each of the directories contains a snapshot of what was in the original directory at the time of copying. You might think that the disk space is occupied in proportion to the number of copies, but this is not the case.

$ du -sh / mnt / backups 4.5M / mnt / backups $ du -sh / home 3.8M / home

The entire set of copies takes up only slightly more space than the original directory. The place goes to the changed files.

If nothing has changed, then the space is still wasted on creating directories that cannot be stored as hard links.

$ du -hs 2017-02-08T22: 20: 06 + 09: 00 2017-02-08T22: 25: 05 + 09: 00 2017-02-08T22: 30: 04 + 09: 00 3.8M 2017-02- 08T22: 20: 06 + 09: 00 136K 2017-02-08T22: 25: 05 + 09: 00 136K 2017-02-08T22: 30: 04 + 09: 00

These significant savings are made possible by the aforementioned hard links that rsync makes for files that have not changed since the last copy.

$ stat -c "% i" 2017-02-08 * / example.txt | uniq 31819810

Identical, unchanged files will have the same inode.

Of course, in terms of the possible saving of disk space, this backup method is far from special programs such as

Martin Streicher
Published on 02/11/2010

Content Series:

Over the past 20 years, the use of computer networks has become extremely widespread. This was mainly due to the development of the Internet, investments in national and international network infrastructure and falling prices for network and computer equipment. Networks are ubiquitous today, and new applications are increasing demands for network scalability and speed. The Internet once started with a few small workstations, but now it and its private counterparts connect countless computers.

Frequently used abbreviations

FTP: File Transfer Protocol
WebDAV: Web-based Distributed Authoring and Versioning

During the same period, UNIX® also grew and offered more and more powerful software for use. FTP was one of the earliest tools for exchanging files between systems and is still widespread today. The rcp command (short for "remote copy") was a step up from FTP in that it not only provided the capabilities of the standard cp utility, but also copied files from one machine to another. rdist, based on rcp, automatically distributed files from one machine to multiple systems.

These tools are all outdated today, for example rcp and rdist do not provide security when transferring files. Now scp takes their place. Although FTP is still widespread, SFTP (Secure FTP), the secure version of FTP, should be used wherever possible. There are other file sharing options as well, such as WebDAV and BitTorrent ™. Of course, the more machines you have, the more difficult it is to keep them in sync or at least known. When working with scp and WebDAV, you need to write your own synchronization script to do this.

Rsync is the ideal tool for distributing files. It can resume file transfers after a disconnect, transfers only those portions of a file that differ in the source file and its destination copy, and can perform full or incremental backups. In addition, it is available on all flavors of UNIX, including Mac OS X, making it easy to link almost any version of UNIX with it.

To familiarize yourself with rsync, we'll first look at typical use cases and then move on to more advanced uses. To demonstrate how rsync works, I'll be using Mac OS X version 10.5, Leopard (a flavor of FreeBSD), and Ubuntu Linux® version 8. If you are using a different operating system, you can port most of the examples to that too; see the rsync man page on your machine to see if the operations used here are supported and if necessary try to find an equivalent.

Introducing rsync

Just like cp, rsync copies files from one location to another. Unlike cp, rsync can perform both local and remote copying. For example, the command given in copies the / tmp / photos directory with all its contents to the home directory.

Listing 1. Copying the directory and its contents

$ rsync -n -av / tmp / photos ~ building file list ... done photos / photos / Photo 2.jpg photos / Photo 3.jpg photos / Photo 6.jpg photos / Photo 9.jpg sent 218 bytes received 56 bytes 548.00 bytes / sec total size is 375409 speedup is 1370.11

The -v option enables verbose messages. The -a parameter (here a stands for archive) is a short form of writing the -rlptgoD parameters, which indicate that a recurse copy should be performed, copying symlinks as symbolic links (links), preserving all files' permissions, time creating (times), group (group) and owner of the file (owner), as well as saving device files and special files (devices). Typically the -a switch creates a mirror copy of the files, unless the system being copied to does not support any of the attributes of the files being copied. For example, copying a directory from UNIX to Windows® does not always display the attributes perfectly. Below are some suggestions for dealing with unusual situations.

rsync has many options. If you suspect that the command parameters, source description, or copy destination are incorrect, you can use -n to do a test run. During a test run, rsync will show you what will be done with each file without actually moving a single byte. After that, after making sure that all the parameters are correct, -n can be removed and the changes will be performed.

Listing 7. Copying the files to the local machine

rsync --port = 7777 mymachine.example.com::pickup/ Hello! Welcome to Martin "s rsync server. Drwxr-xr-x 4096 2009/08/23 08:56:19. -Rw-r - r-- 0 2009/08/23 08:56:19 article21.html -rw -r - r-- 0 2009/08/23 08:56:19 design.txt -rw-r - r-- 0 2009/08/23 08:56:19 figure1.png

By swapping the source and destination addresses, you can write the file (s) to the module from the local machine, as shown in.

Listing 8. Swapping the source and destination directories

$ rsync -v --port = 7777 application.js mymachine.example.com::dropbox Hello! Welcome to Martin "s rsync server. Application.js sent 245 bytes received 38 bytes 113.20 bytes / sec total size is 164 speedup is 0.58

This was a quick, but fairly complete overview of rsync's capabilities. Now let's see how you can apply this package to your day to day tasks. rsync is especially useful for backups. And since it knows how to synchronize local and remote files or even file systems, it is an ideal tool for managing large clusters of machines that must be (at least partially) identical.

Backing up your data using rsync

Keeping backups on a regular basis is an extremely important but usually overlooked routine. Neither the length of the backup procedure, nor the need for large external file storage, nor anything else can be an excuse; copying data to ensure its safety should be a daily routine.

To make this task painless, use rsync for backups and a remote server, possibly provided by your ISP. Each of your UNIX machines can use this mechanism, which is the ideal solution for storing your data securely.

Install SSH keys, rsync daemon on the remote machine and create a writeable backup module. After that, run rsync and, as shown in the script from, create backups that are unlikely to take up much space.

Listing 9. Create daily file backups

#! / bin / sh # This script based on work by Michael Jakl (jakl.michael AT gmail DOTCOM) and used # with express permission. HOST = mymachine.example.com SOURCE = $ HOME PATHTOBACKUP = home-backup date = `date" +% Y-% m-% dT% H:% M:% S "` rsync -az --link-dest = $ PATHTOBACKUP / current $ SOURCE $ HOST: PATHTOBACKUP / back- $ date ssh $ HOST "rm $ PATHTOBACKUP / current && ln -s back- $ date $ PATHTOBACKUP / current"

Replace HOST with the name of your backup server and SOURCE with the directory you want to save. Replace PATHTOBACKUP with the name of the module. (Alternatively, the last three lines of the script can be looped and multiple directories can be backed up by changing the SOURCE variable). This scenario works as follows.

First, a string like 2009-08-23T12: 32: 18 is placed in the date variable, containing the current date and time; this string will uniquely identify each backup.
The rsync command does the bulk of the work here. The -az options preserve all information about the files and perform data compression before transferring them, and the --link-dest = $ PATHTOBACKUP / current option indicates that if a file has not changed, you should not copy it to a new backup instance, but create a hard link pointing to this file in an existing archive. In other words, the new backup contains only files that have undergone changes, the rest of the files are links.
Let's consider the scenario in more detail (and substitute their values for all variables). The current archive is mymachine.example.com::home-backup/current. The new archive for the / home / strike directory will be located in the mymachine.example.com::home-backup/back-2009-08-23T12:32:18 directory. If the file in / home / strike has not been changed, then the file in the new archive will be represented by a hard link to the corresponding file in the current archive. Otherwise, the new file is copied to the new archive.
If you change only a small number of files and directories every day, then the additional space required for the next copy of the backup will be negligible. Moreover, since all backups (except for the very first one) are quite small, you can keep a long history of your files at your disposal.
In the last step, we change the organization of the backups on the remote machine to make the newly created archive the current archive and thus minimize the differences that need to be recorded during the next script execution. The last command deletes the current archive (which is just a hard link) and creates a symbolic link with the same name pointing to the new archive.

Once you get started with remote rsync for your day to day tasks, you probably need to keep the daemon up and running at all times. For Linux and UNIX machines, there is an rsync boot script, which is usually found at /etc/init.d/rsync. By using this script and your operating system's utility that controls how components are turned on and off, you can arrange for rsync to start at boot. If you are running the rsync daemon without superuser privileges, or you do not have access to boot scripts, you can run rsync using cron:

@reboot / usr / bin / rsync --daemon --port = 7777 --config = / home / strike / rsyncd / rsyncd.conf

This command starts the daemon every time the machine is rebooted. Place this line in your crontab file and save it.

You've already seen how you can detect a problem early by using the -n preview. You can also monitor the status of rsync tasks using two parameters: --progress and --stats. The first of these options displays a progress bar for the task. The second shows the statistics of data compression and transmission. With --compress, you can speed up the transfer of data between machines. Instead of sending the data in its original form, the sender compresses it before sending it, and the receiver decompresses it, and as a result, fewer bytes are transferred in less time.

By default, rsync copies all files from the data source to the destination. This is called duplication. If you want to organize data mirroring, i.e. to exactly match local and remote data, use the --delete parameter. For example, if the source contains files A, B, and C, then by default rsync will create copies of all three files on the remote machine. However, if you delete from the source, for example, file B and perform duplication again, then file B will remain on the remote machine, i.e. the remote copy will no longer be an exact copy of the local data. The --delete command performs data mirroring by removing files from the remote copy that no longer exist in the original data.

There are often files that you would not like to archive or back up. These can be auxiliary files created by editors (their names usually end with a tilde [~]) and other utilities, as well as many irrelevant files in your home directory, such as MP3 files, that can be restored if necessary. In this case, you can specify rsync templates by which it will exclude files from processing. You can specify a template on the command line, or you can specify a text file containing a list of templates. Templates can also be used in conjunction with the --delete-excluded command to remove such files from a deleted copy.

To exclude files that match a specific pattern, use the --exclude command. Remember that if any characters in the pattern have special meaning to the shell, such as *, then the pattern must be enclosed in single quotes:

$ rsync -a --exclude = "* ~" / home / strike / data example.com::data

Let's say the file / home / strike / excludes contains the following list of templates:

* ~ * .old * .mp3 tmp

Then you can copy all files except those that match any of these patterns using the following command:

$ rsync -a --exclude-from = / home / strike / excludes / home / strike / data example.com::data

Sync it up

Now that you are familiar with rsync, you have no reason not to perform regular backups. What happened? Did your dog chew on your hard drive? (It also happens!) Take action in advance, and then your data will remain in perfect order. After all, now all your valuable files are stored v

Rsync is a file synchronization and backup utility. It works on many * nix systems).

The limitation of rsync is that data cannot be copied between 2 remote systems. In this case, you would have to copy data from one remote system, and then transfer to another.

In aptosid you have various options to start the synchronization process. You can run rsync with a command in a terminal, or install additional packages from Debian Sid:

To install a deb package:

apt-get install luckybackup

Instructions for use in the terminal

In the next section, we present rsync, its features, and some examples of how rsync can be used with a custom backup script.

rsync is a fast backup program for directories and files. rsync calculates modified files and directories using attributes such as value or date, which can make synchronization very fast. The data is archived before copying and unpacked at the destination.

rsync can copy data:
* from local system to local system,
* from the local system to the remote (remote) system,
* from a remote system to a local system.

In this case, rsync uses the ssh client (included in the basic installation), or rsync-daemon, which work on the source and on the target system. The rsync manpages says if systems can communicate over ssh, ssh can also be used for rsync.

The limitation with rsync is that data cannot be copied between 2 remote systems. In this case, you will have to copy data from one remote system, and then transfer from rsync to another.

To clarify this, consider the following example with 3 computers:

Neo - local system morpheus - remote system trinity - remote system

Each user knows the name of another user, and rsync runs exclusively on neo, the local system:

Username on neo is cuddles, Username on morpheus is tartie, Username on trinity is taylar.

The goal is to keep the / home / $ user / Files directories in sync:

Neo: / home / cuddles / data with morpheus and trinity, morpheus: / home / tartie / data with neo and trinity, trinity: / home / taylar / data with neo and morpheus.

Now the problem arises that rsync cannot be applied between 2 remote computers:

Neo-> morpheus - from local to remote, it will turn out neo-> trinity - from local to remote, it will turn out morpheus-> neo - from remote to local, it will turn out trinity-> neo - from remote to local, it will turn out morpheus-> trinity - impossible , from remote to remote trinity-> morpheus - not possible, from remote to remote

To get around this limitation, we proceed as follows:

Morpheus -> trinity - becomes: morpheus -> neo and neo -> trinity trinity -> morpheus - becomes: trinity -> neo and neo -> morpheus

This extra step doesn't ultimately change anything. However, indicates that:

This limitation with rsync should be considered when planning the backup process.

Using hostnames with hostnames in rsync.

Using hostnames neo, morpheus, and trinity instead of IP addresses can make the copy process clearer and easier to understand.

To do this, you must edit / etc / hosts and insert the hostnames and their associated IP addresses. Thus, in our example, the / etc / hosts file will look like this:

192.168.1.15 neo 192.168.1.16 morpheus 192.168.1.17 trinity

The first line translates the IP address 192.168.1.15 to “neo”, the second 192.168.1.16 to “morpheus” and the third 192.168.1.17 to “trinity”. After recording, an additional hostname can be used instead of an IP address. This is especially convenient if the distributed IP addresses are changed, for example, for "neo" from 192.168.1.15 to 192.168.1.25 This makes it easier to work with scripts, since they should not change in the event of a change in IP addresses, but only the / etc / file hosts.

Two ways to use rsync.

The first way is data "Push (push)" to the target machine; the other is data "Pull (pull)" from the source. Each method has pros and cons, which will be discussed below. Our example uses local and remote systems to help explain the terminology more clearly.

"Push"- the local system carries the source directories and files, the target is the remote system. The rsync command runs on the local system and pushes data to the target system.

Advantages:
* More than one source system can be backed up to the target. * The backup process on multiple computers can take place at the same time. * If the system finishes the backup process faster, it allows resources to be used for other jobs.

Disadvantages:
* If the script is used with synchronization via cron, then crontabs must be set on each system. When modifying the script, it is necessary to make the appropriate changes on each system; when the schedule changes, each crontab on each computer must change. As a result, the administrative service becomes very cumbersome and confusing. * The backup process is not possible to check if the target partition was mounted by the target system. If it is not mounted, no backup will take place.

"Pull"- the remote system carries the source directories and files, the target is the local system. The rsync command runs on the local system and pulls data from the source system.

Advantages:
* The system will become the server that manages all backup processes of all other systems. Backup processes are centralized.
* When using the script, it must be on only one system, which simplifies any modifications. Only one crontab should change per schedule change.
* The script can check if the target partition is mounted and will mount it on occasion.

The rsync syntax (part from "man rsync"):

rsync ... SRC ... DEST rsync ... SRC ... HOST: DEST rsync ... SRC ... HOST :: DEST rsync ... SRC ... rsync: // HOST [: PORT] / DEST rsync ... SRC rsync ... HOST: SRC rsync ... HOST :: SRC rsync ... rsync: // HOST [: PORT] / SRC

Working examples of rsync commands:

Explanation of the parts of this command:

Source: (/ path / file) morpheus: / home / tartie, target is: / media / sda7 / SysBackups / morpheus / home

The / home / tartie directory (including subdirectories) will be saved to / media / sda7 / SysBackups / morpheus / home, which will look like this after rsync:

/ media / sda7 / SysBackups / morpheus / home / tartie

Note that only the / tartie directory is copied to the path specified by rsync / media / sda7 / SysBackups / morpheus / home. The "source" only selects where the data comes from, and the "Target" tells rsync where the data from the "source" should be copied, it no longer perceives it as / home / tartie, but simply / tartie, which it should send to / media / sda7 / SysBackups / morpheus / home. Another example:

Rsync [...] / home / user / data / files / media / sda7 / SysBackups / neo

Here, the source / files directory and all directories and files in it will be copied to the target folder / media / sda7 / SysBackups / neo / - and not to / media / sda7 / SysBackups / neo / home / user / data / files.

This is something to be aware of when using rsync backups.

Explanation of opts_y (chornovy translation from the English-language "man rsync"):

-a for archiving mode. The manpage says about this: “Simply put, this method is to create recursive backups and copy almost all attributes. Only hard links are not saved due to the complexity of the process. The -a option matches: -rlptgoD, which means: -r = rekursiv - copy subdirectories and files to them from their "original location". -l = Links - symbolic links are restored at the destination. -p = rights - the rights are identical with them in the original location. -t = timestamp - the timestamp at the original location is identical with it. -q = quiet - minimal information output. More information is obtained with the -v option after the -a option. Execution without any response will be achieved without setting the -v option. -o = owner - If rsync is run as root, the owners of the original files are kept appropriately. -D = matches both of these commands: --devices --specials --devices = character and block device files will be copied to the remote system for later recovery. Note that without the --super option, the --devices option does not work. --specials = rsync copies special files like sockets and fifos. --g: groups are saved according to the original files. -E: The "doable" attribute is retained. -v: for verbose output. If the details are not important, this option can be omitted. However, if you need to see what is happening, this option is very useful. -z: The copied data is archived, which speeds up the copying process, since the amount of data moved will be less. --delete-after = Target directories or target files that are no longer in the source are deleted after the move, not before. In case of problems or disasters, “after” will be used, “delete” prevents unnecessary files and directories from being used at the destination. - exclude = indicates files or directories to be excluded from the copy process. In the example, --exclude = “* ~ "Would exclude ALL files ending with" ~ "from the backup process. With one --exclude option, only one argument can be passed, with more exclusion arguments, multiple --exclude options must be used.

Additional commands:

-c - Performs further comparison checks, which is time consuming. Since rsync is already doing benchmarking, this command has not been integrated into --a due to its redundancy and also to save time. This option is usually not needed. --super - target system will try to do superuser (root) actions (see manpage) --dry-run - test run: just shows what should be copied. No files are copied.

And finally, options for specifying source and target directories.

Example commands:

rsync -agEvz --delete-after --exclude = ”* ~” morpheus: / home / tartie / media / sda7 / SysBackups / morpheus / home

This command copies all directories and files below from / home / tartie to the "morpheus" system and places them in this directory / media / sda7 / SysBackups / morpheus / home. The tartie directory structure is preserved.

Rsync -agEvz --delete-after --exclude = ”* ~” / home / tartie neo: / media / sda7 / SysBackups / morpheus / home

This is the opposite command of the first example. It "moves" the directory / home / tartie and its contents to the specified directory on the "neo" system - note that the system is considered "remote" if you put a colon in front of the path.

Rsync -agEvz --delete-after --exclude = ”* ~” / home / cuddles / media / sda7 / SysBackups / neo / home

This is a local computer backup process. Note here that no colon is set. The local / home / cuddles directory is copied to / media / sda7 / SysBackups / neo / home on the same local machine.

rsync with many --exclude options:

rsync -agEvz --delete-after --exclude = ”* ~” --exclude = ”*. c” --exclude = ”*. o” "/ *" / media / sda7 / SysBackups / neo

This command copies everything from the local system root directory (all directories and files) to / media / sda7 / SysBackups / neo - excluded from this are all files and directories that end in “~”, “.c” or “.o” ...

Replacing a hostname with an IP address:

The first command is installed with the hostname method, the second with the IP addresses method. Both commands are identical in their execution:

Rsync -agEvz --delete-after --exclude = ”* ~” morpheus: / home / tartie / media / sda7 / SysBackups / morpheus / home rsync -agEvz --delete-after --exclude = ”* ~” 192.168. 1.16: / home / tartie / media / sda7 / SysBackups / morpheus / home

The hostname method should not be used, but in our opinion it simplifies rsync backups over networks.

class = "highlight-2"> Impossible command:

rsync -agEvz --delete-after --exclude = ”* ~” morpheus: / home / tartie trinity: / home

As already mentioned, the limitation of rsync is that this command cannot be copied between 2 remote computers. We would like to draw your attention to this once again.

We hope this little tutorial will make it easier for you to get started using rsync, it is a very successful backup program.

I found a directory sync script through the program rsync... I thought that synchronization, that backup is the same thing, only they are called differently
The rsync program is included with all distributions and is popular enough that you can easily find it in every distribution.
Let's go straight to examples of rsync synchronization.
We want to have a copy of the / home / user / foto directory, and we will store the duplicate folder in / mnt / backup. Execute (do not forget to check the folder permissions where we copy it, or execute as root-sudo)

Rsync -av / home / user / foto / mnt / backup

-a archive transfer (along with folders, subfolders, files, hidden files, etc.)
-v display command execution data

It's as simple as that! Our photos are already in two places. Of course they are not archived and take up as much space as the first folder, but you have to sacrifice something. And you will have to sacrifice your disk space. By the way, for photographs, archiving is ineffective, the compression ratio is too low. Naturally, you need to copy to another physical disk! Because if one hard drive dies, then all other partitions on this disk with all the backups will die too.
Let's continue, I'm distracted for something ..
There is a small catch in the above command. For example, you renamed or sorted photos into other folders, and changed the name of the old folders or deleted them altogether. What will happen? The script will honestly transfer the new files and directories that have appeared, and the old ones that you deleted in the / home / user / foto folder will remain in the backup folder. In fact, in frequently changed directories, executing such a script will lead to a trash heap in the backup folder. To avoid this, the command must be executed with the key --delete

Rsync -av --delete / home / user / foto / mnt / backup

And both folders will always be identical
Now you need to select your most important directories and write a simple list of backup commands to the file. It once looked something like this to me.

Rsync -av --delete / home / mik / mail / mnt / backup / home / mik / rsync -av --delete /home/mik/.mozilla / mnt / backup / home / mik / rsync -av --delete / home / mik / .claws-mail / mnt / backup / home / mik / rsync -av --delete / bin / mnt / backup / system rsync -av --delete / boot / mnt / backup / system rsync -av - delete / dev / mnt / backup / system rsync -av --delete / etc / mnt / backup / system rsync -av --delete / lib / mnt / backup / system rsync -av --delete / netup / mnt / backup / system rsync -av --delete / opt / mnt / backup / system rsync -av --delete / root / mnt / backup / system rsync -av --delete / sbin / mnt / backup / system rsync -av --delete / var / mnt / backup / system

I pay special attention not to backup everything. If, for example, you select your entire home directory for backup, then there may be a collection of your movies, open source distributions, etc. All this takes up too much space, and in my opinion, it is not worth backing up.
Let's go back to our file with a list of directories for backup and make it executable

Chmod + x ./ file_name

The script can be written in crontab for daily execution, and in most distributions such as ubuntu or debian just put it in /etc/cron.daily directory. However, you may still need to make changes to the / etc / crontab file. Most computers don't work at night, so change the /etc/cron.daily tasks to a time when your computer is usually on.

And now the command examples rsync that you just need to keep in mind. And then you never know what tasks you may face in the future.

1 example.

Rsync -av --delete --exclude = "*. Avi" --exclude = "*. Mpg" / home / user / mnt / backup

The team syncs the whole your home directory, but does not copy files with avi and mpg extensions. If you don't need something else, then just add in a row --exclude = "_ what_exclude_"

2 example.

Rsync -avz -e "ssh -l ssh_user -p5623" --delete / home / user / foto 192.168.0.1:"./temp "

But this is already a command for synchronization between your and a remote computer via an ssh connection.

-z additionally compress the data
-e using a remote shell
ssh -l ssh_user -p5623- a command to connect to a remote computer. ssh_user user, and -p5623 is if the ssh port is non-standard, in example 5623
192.168.0.1:"./temp "- the ip address of the computer (you can also use the name). We also indicate the directory where we will transfer the data. Note that the directory is specified via ./ , we are already starting from the path of the home directory ssh_user where we get when connecting via ssh

After entering this command, we must log in with the ssh_user password, and only then will directory synchronization begin. Likewise, you can synchronize the remote directory and the local one. It is enough just to swap the folders where we synchronize from and where.
If you want to automate the process and not enter a password every time, then read my article