The University of Massachusetts Amherst
Categories
Linux

Understanding the Rsync Utility

We’ve all had a file scare sometime in our computing careers. In many cases, the computer won’t show any signs of corruption until everything starts to fail at once. When things do go wrong, your first thought is always about the safety of your files – will you be able to recover them?

If you had backed up your files regularly with a backup utility, you wouldn’t have to worry about whether you would be able to recover all of your data. However, many people do not see the value in doing regular backups, because they think that it is a waste of time. Their rationale is that the probability that a computer will need to access the backup is small enough that waiting for the computer to copy over all of their files every time they do a backup is pointless.

Rsync is an intelligent backup utility. Instead of duplicating the entirety of the data which is being copied over (looking at you cp), rsync will calculate the differences between what is being copied and what already resides in the directory, and will only copy over the differences. If the creation time and size of a file have not changed, rsync will move on without making any copies. This saves lots of time, which would have been spent on doing costly I/O operations. Rsync will take about as long as cp to complete the first time a backup is made, but subsequent backups could be done in a matter of minutes instead of hours, depending on the frequency with which you back up your system.

Rsync also includes a lot of flags which can help with the backup process.

–exclude is useful for ignoring large directories. If a full linux backup is being made, directories like /var and /proc will be excluded due to their huge size and session-specific information.

–delete will remove anything which is present in the backup directory which is not present in the source directory. This is mostly useful for creating snapshot copies of a system. If you would rather keep every file backed up, even if you delete it on your own system, this flag is not necessary.

–archive, also known as -a, is another useful flag. It is equivalent to the flags -rlptgoD. It performs a snapshot archival of the specified system. I’ll go into the individual flags in more detail below.

  • The -r flag stands for recursion. It tells the program that you want it to copy the contents of a directory as well as its shell. If this flag is not set, you will have an unhappy TA on your hands, looking at a series of empty directories.
  • The -lptgo flags preserve the information on a certain file. If these are not set, new creation information is created for each newly generated file, indicating the permissions, owner, etc. of the directory where the copied information is going to be stored. To keep the creation information on the original file intact, -l preserves links between files, -p preserves the permission of the file, -t preserves the creation time of the file, -g preserves the group the file is associated with, and -o preserves the owner of the file.
  • The -D flag is the most optional of all of the backup information. It preserves information on the devices and any special files which are mounted at the time of the backup.

Finally, rsync has flags which let the user know what is going on during the backup process. The -v flag stands for verbose. It outputs the current step on the screen, so the usre will know how far into the backup they currently are. Since this is an I/O operation, it will slow down the overall program, but many people believe that this kind of knowledge is worth the tradeoff. In order to further modify the -v flag, you are also able to set the -h (human-readable) flag, which makes any sizes that the computer outputs be rendered as MB and GB, as opposed to full byte numbers.

This is an example of an rsync script command, which will take a snapshot of whatever is in folder 1 and store it in folder 2, deleting anything in folder 2 which is not in folder 1 and telling you everything that it is working on in between:

rsync -av –delete /home/folder1/ /home/folder2/