As I was saying at the end of the last post, locking myself out of the server really made me want to have implemented my backup mechanism sooner, especially considering that at this point in the life of QWERTY (my VPS) there isn't that much to back up.

In short, my backup needs are simple:

  1. Archive the important data:
    • IRC bouncer logs
    • git repositories
    • configuration settings
  2. Upload the archive to a remote server;
  3. Repeat 1. and 2. daily.

Archiving the data

This is a really simple bash script which creates a .tar.gz archive of the folders I specify. I could further improve the script by making it do incremental backups instead of creating a new archive each day, but considering that this is a fast and dirty version of it, and that I don't have that much stuff to back up, it'll do for now.

#!/bin/sh
date
echo "############### Backing up files on the system... ###############"

backupfilename=server_file_backup_`date '+%Y-%m-%d'`

echo "----- Now tar, then zip up all files to be saved -----"
tar cvf /home/backup/${backupfilename}.tar /home/logs/* /home/stats/pisg.cfg /home/gugu/.znc/configs/znc.conf /home/repos/*
gzip /home/backup/${backupfilename}.tar
rm /home/backup/${backupfilename}.tar
chmod 755 /home/backup/${backupfilename}.tar.gz

echo "############### Completed backing up system... ###############"
date

The echo lines aren't really necessary, but I like having them there in case I run the script by hand for debugging purposes.

What it does is create an archive named server_file_backup_$current-date and adds the folders and files specified on the line starting with tar cvf. Well, better said it first creates a tar archive, then it gzips it and deletes the original .tar; this is another point where the script could be improved, but hey, it works OK right now.

The archive is saved under /home/backup and it is given all the rights except the capability for anyone to write to the file.

Then all I have to do is make the script executable with:

chmod +x archive.sh

Uploading the archive(s)

A backup wouldn't really be a backup if it lives on the same system you're doing it on, now would it? Now, there are a couple of ways to move the data out, but I decided to simply FTP it out to the webserver which hosts my main blog.

I initially tried the script using the built-in ftp utility but didn't manage to get it to work, so I looked around and settled for ncftp; debugging the problem would've been more interesting and a better learning experience but I wanted to get the backups up and running as soon as possible, so this will be a future exercise on my part.

The script is a simple one which logs in to the remote server using a provided username and password and uploads all archives to a pre-specified directory.

#!/bin/sh
USERNAME="ftp_user_name"
PASSWORD="ftp_password"
SERVER="ftp_server"

# local directory to pick up *.tar.gz file
FILE="/home/backup"

# remote directory to upload backup
BACKUPDIR="/back"

# upload file
ncftp -u"$USERNAME" -p"$PASSWORD" $SERVER<<EOF
cd $BACKUPDIR
mput $FILE/*.tar.gz
quit
EOF

And after running the same chmod +x on up.sh, ensuring it's executable, I've got a nice setup to

  1. Create the backup archive;
  2. Upload it to the remote server.

The magic of cron

Now all I have to do is make sure that the scripts run daily so that in case of something bad happening I don't have a lot of data missing. Of course, nothing really stops me running the scripts more often than that, but it's a nice trade-off between the size of the files and how much new data there is added.

Linux has something very similar to the Windows Task Scheduler called cron; what it does is allows users to schedule jobs (commands or shell scripts) to run periodically at certain times or dates.

In order for my scripts to run periodically I had to add them to the crontab file, the configuration file for cron which tells it when the jobs need to be executed. To get to the list all I had to do was run:

crontab -e

Now I just needed to add the jobs. I wanted to run the archiving script every night at 1am, and then at 1:30am the upload script. Why the 30 minute gap between the jobs? I want to have a margin of error in case I'll add to the script which makes the archive, or in case files get really large, so that I won't have the upload script try and run while the archive is being created.

Anyway, the code for the jobs is:

* 1 * * * /home/backup.scripts/archive.sh
30 1 * * * /home/backup.scripts/up.sh

The first line tells cron that it has to run archive.sh every day at 1am and up.sh at 1:30am.

And with this I have a backup system in place.


To do

I know that things can be improved, and in no way do I say that this is perfect, but it suits my needs at this point in time.

What can (and will be) improved upon are the following:

  1. Making incremental backups - This would mean that I'd set a specific day of the week to be the the full backup day, when everything gets archived, and then on every other day only the files that changed get updated.
  2. Use built-in ftp client - Go back and check to see the reason why I couldn't get ftp to work properly so that I had to use ncftp instead.

I guess that's about it. In all honesty, if I had ssh access to another server I'd just use rsync for the backups, but seeing that I don't, this is the best method that came to mind.

In the next post I'll most likely talk about git and how I've integrated it with Pelican.

- fuzzmz