CONTENTS
--------

- overview
- license
- important hint
- installation (for Linux, BSD and other Unixes)
- getting more informations / needing help
- general functionality
- how does it work?
- mounting read only
- parameters
  - configuration file
  - command line parameters
- including / excluding files and directories
- strategies to delete old backups
- monitoring
- limitations


OVERVIEW
--------

- Copies directory hierarchies recursively into another location, by
  date (e.g. /home/ => /var/bkup/2002.12.13_04.27.56/). Permissions are
  preserved, so users with access to the backup directory can recover
  their files themselves.
- File comparisons are done with MD5 checksums, so no changes go unnoticed.
- Hard-links unchanged backed up files to old versions and identical
  files within the backed up tree.
- Can hard-link between independent backup series (from different machines)
- Compresses large files (that don't match exclusion patterns).
- Manages backups and removes old ones. 



LICENSE
-------

storeBackup is licensed under the GPL and is also "cardware" :-)
This means, if you like the program and have benefited from it, please
send me a postcard (with a picture of your city/land) to:
Heinz-Josef Claes
Ostheimerstrasse 27 b
61130 Nidderau
Germany


Copyright (C) Dr. Heinz-Josef Claes (2001-2004)
              hjclaes@web.de

This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or (at
your option) any later version.
   
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
General Public License for more details.
   
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.



IMPORTANT HINT
--------------

storeBackup is a tool for backing up a file system tree on GNU/Linux
or other Unixes to a separate directory. For reasons of security you
have to mount the directory that is going to be backed up as read
only. This makes it impossible for storeBackup to destroy the original
data. No such case is known to the author after intensive testing on
large file systems (about 1.5 million files) over a period of half a
year (without mount read only). This is a safety precaution you should
use to protect you and your data because this program is distributed
in the hope that it will be useful, but WITHOUT ANY WARRANTY; without
even the implied warranty of MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE. See below how to mount read only.



INSTALLATION
------------

The tar file consists of two directories: 'bin' and 'lib' with some files.
Simply unpack the archive where ever you want to install storeBackup:
cd 'where ever you want'
tar jxf .../storeBackup.tar.bz2
and add $PATH to the resulting 'bin' file (or call the program with
the full path).

If you are a Debian user, you can copy the shell script cron-storebackup
to /etc/cron.daily/storebackup. Refer to the file for further instructions.

In order for storeBackup to function, you need:

- /usr/bin/env
and in you $PATH:
- perl (with Berkeley DB, which is part of the common perl distribution)
  It should run with perl5.6 or newer.
- md5sum
- bzip2
- cp
- mknod
- mount (for checking the file system type)
- chattr (only if you want to use `chattr +i`)
- Lastly any other compression programs e.g. gzip if you want to use them


If you are using FreeBSD or other versions of Unix, you need the program md5sum
in your $PATH. If this program is not implemented on your system, I appended
a tar file 'md5sum.tar'. Unpack the tar file and compile the program, for
example: 
gcc -O6 -o md5sum md5.c md5sum.c
Then install md5sum to a directory that is in your $PATH Variable.
If you have problems installing storeBackup on other Unix Systems, don't
hesitate to contact me.

My knowledge is that StoreBackup runs on Linux, FreeBSD, Solaris and AIX.


GETTING MORE INFORMATIONS / NEEDING HELP
----------------------------------------------

You can cantact me via email (hjclaes@web.de).

There is also an EXAMPLES files in the distribution.

Debian users should look at
http://packages.debian.org/unstable/utils/storebackup.html
The maintainer of the debian package is Arthur Korn (arthur@korn.ch).



GENERAL FUNCTIONALITY
---------------------

storeBackup is a disk-to-disk backup tool for GNU/Linux. It should run
on other Unix like machines. You can directly browse through the
backuped files (locally, via NFS, SAMBA or whatever). This gives the
users the possibility to restore files easily and fast. He/She only
has to copy (and possibly uncompress) the file. There is also a tool
for easily restoring (sub) trees for the administrator. There is also
an option that allows single backups of specific times to be deleted
without affecting the other existing backups.

This package consists of the following tools:

storeBackup.pl		- performs the backups
storeBackupRecover.pl	- recovers files or (sub) trees from the backups
			  (especially for spanning multiple users)
storeBackupVersion.pl	- analyze the versions of backed up files
storeBackupls.pl	- lists backed up directories (versions) with
			  information (week day, age of backup)
storeBackupConvertBackup.pl - convert (very) old backups to new format
			  (see file _ATTENTION_)
storeBackupDel.pl	- deletes old backups (same algorithms as in
			  storebackup.pl).
storeBackupMount.pl	- pings server, mounts file system(s),
			  calls storeBackup, umounts file systems(s)
			  It writes a log file and has a detailed
			  error handling

For your convenience I have added the following scripts:

llt			- show atime, ctime and mtime; llt -h gives
			  usage infos
multitail.pl		- more robust as `tail -f` for n files



HOW DOES IT WORK?
-----------------

storeBackup makes a backup of a directory to anothe, it does not care
where this location is (same disk, different disc, via NFS over the
network). You should use another disk or better another computer to
store the backup. The target directory must be on a Unix virtual file
system which supports hard links, backing up to a SAMBA share is not
possible. Naturally, you can also mount the source directory via NFS
and backup in a local filesystem. In this case, it's good to have a
fast network.  The backup(s) can be seen below a directory in the form
date_time (yyyy.mm.dd_hh.mm.ss) which it creates.

There are several optimizations that have been implemented to reduce
disk usage:

- The files to be backed up will be compressed (default bzip2) as
  discrete files in the backup. Files with definable suffixes (like
  .gz, which is part of the default value) will not be compressed. It is
  also possible to configure storeBackup so that it does not compress
  anything.

- If a file with the same contents exists in the previous backup, the
  new backup will only be a hard link to the other one. (This
  mechanism depends on the contents, not on a file name or path!) If you
  rename a file or directory or move sub trees around, it will not cost
  you additional space in the backup.

- You can also check older backups than the last one for files with
  the same contents. But this is normally not worth the effort. You
  can also check backups from *other* machines for files with the same
  contents, which can be very efficient.

- If a file with the same contents exists in the actual performed backup,
  they will be hard linked (and naturally to the older ones in the
  existing backups).

As a result, only changes resulting in different file contents will be
stored (compressed) and will require disk space. Normally, the
required disk space is less then the required space of the
original. But this depends on the number of backups and changes.

There are several optimizations to improve performance. The first
backup is much slower than the following, because all the data has to
be compressed and/or copied. StoreBackup has the ability to take
advantage of multiprocessor machines.

StoreBackup creates special files in the root of the backup called
.md5CheckSums.info and .md5CheckSums or .md5CheckSums.bz2
(default). Do not delete these files! They contain all the
information about the original files. You can use this information to
write your own tools to restore or to analyze the
backups.

When started, storeBackup will read .md5CheckSums and creates its own
databases (dbm file) in $TMPDIR or --tmpdir (default is /tmp). If you
back up a large number of files, the required space can be several
dozens of megabytes. If you do not have enough memory to cache the dbm
file, I recommend using a separate hard disk (if available) for better
performance.


MOUNTING READ ONLY
------------------

If you want to mount read only, follow the instruction of the used
tools. If you want to mount a tree of your local file system read only
for storeBackup, you can use NFS. Make sure you do not generate an
infinite loop :-)
It's a good idea to set the option noatime and to use NFSv3.


PARAMETERS
----------

There are two possiblities to call storeBackup.pl:
. with command line parameters
. with a configuration file.


. CONFIGURATION FILE
--------------------

storeBackup.pl -f configFile [-g | --print]

--file		-f  configuration file (instead of parameters)
--generate	-g  generate a template of the configuration file
		    inside the template, the possiblities are explaned
--print		    print configuration read from configuration file an stop



. COMMAND LINE PARAMETERS
-------------------------

storeBackup has a lot of parameters to tailor it to individual
needs. Minimum parameters are:

storeBackup.pl -s sourceDir -t targetDir

Here they all are explained in detail:

--sourceDir	-s  source directory (must exist)
		    e.g. /home/bob/toSave
--targetDir	-t  target directory (must exist)
		    e.g. /vol1/savePlace
		    The above examples will *not* create a directory 'toSave'
		    in /vol1/savePlace. If you want a separate directory in
		    the target directory, you have to specify
		    /vol1/savePlace/toSave
--tmpdir	-T  directory for temporary file
		    This is used for two temporary databases. It can be quite
		    lage if you backup a lot of files. If it is large, it will
		    generate a lot of i/o on the disk. So it is a very good
		    idea to locate --tmpdir on special disk (and not over nfs).
		    The default for the temporary directory is '/tmp' or
		    $TMPDIR, if set.
--lockFile      -L  lock file, if exist, new instances will finish if
		    an old is allready running
		    The prevents the running to two storeBackups on the same
		    contents at the same time, which is disastrous, because it
		    would compress everything over and over again.
--exceptDirs	-e  directories to except from backing up (relative
		    path), separated by --exceptDirsSep (default is ',')
		    If you use --followLinks (see below), you have to specify
		    the symbolic links like a normal path.
		    It is possible to use shell type wildcards. If you do so,
		    it's better to qoute the strings, eg:
	            -e '/home/*/.netscape/cache'
		    If do not qoute, the shell will possibly replace you string.
--includeDirs	-i  directories to include in the backup (relative
		    path), wildcards are possible and have to be
		    quoted, the directories have to be separated with
		    --exceptDirsSep
--exceptDirsSep     Separator for --exceptDirs, default is <,>
--exceptPattern     Files to exclude from backing up. You can define a rule
		    with pattern. See the config or (better) below
		    'INCLUDING / EXCLUDING FILES AND DIRECTORIES' for
		    a detailed description.
--includePattern    Files to include in the backug up - like exceptPattern
--contExceptDirsErr normally, storebackup.pl will stop, if one of the paths
		    specified with --exceptDirs do not exist. If you use this
		    option, it will continue. Useful with wildcards, use
		    with care (or you will save a lot stuff you do not want)
--exceptDirsSep     Separator for --exceptDirs, default is ','
--precommand	    exec job before starting the backup, checks lockFile (-L)
		    before starting (e.g. can be used for rsync)
		    stops execution if job returs exit status != 0
		    This can be used to start a script with rsyncs to get the
		    data to another machine. So it is possible to embed a
		    syncronisation to into storeBackup and get the messages
		    of that tool in --logFile.
--followLinks	    follow symbolic links like directories up to depth
		    default = 0 -> do not follow links
		    You can create a special directory for backing up which
		    contains symbolic links to the really to backup trees.
		    Normally, storeBackup will *not* follow symbolic links;
		    with this option you can force it to do so. In the
		    described example, you will set --followLinks to 1
--compress	-c  compress command (with options), default is 'bzip2'
		    You can use any command to compress the files, but it
		    has to support stdin/stdout. The compression program
		    must be in $PATH.
--uncompress	-u  uncompress command (with options), default is 'bzip2 -d'
		    You can use any command to uncompress the files (used by
		    storeBackupRecover), but it has to support stdin/stdout.
		    The uncompression program must be in $PATH
--postfix	-p  postfix to add after compression, default is '.bz2'.
		    This postfix should match with your compression program.
		    storeBackup can handle a file 'x' and 'x.bz2' in the
		    source directory without problems in the target directory,
		    so you can use normal suffixes.
--postcommand	    exec job after finishing the backup,
		    but before erasing of old backups
		    reports if job returs exit status != 0
		    (If you want to unmount a filesystem mounted for
		    storeBackup, you should write a shell script.)
--noCompress	    maximal number of parallel compress operations,
		    default = 4
		    If you have a multiprocessor machine, it makes sense to
		    play with this value.
--queueCompress	    length of queue to store files before compression,
		    default = 1000
		    storeBackup tries to use maximal ressources of i/o and
		    cpu at the same time. This queue is build to have a stock
		    of cpu busy tasks.
		    Look in the statistical output of the log file to
		    see the maximal used length of this queue.
--noCopy	    maximal number of parallel copy operations,
		    default = 1
		    Normally, it doesn't make sense to copy more than one
		    file at the same time.
--queueCopy	    length of queue to store files before copying,
		    default = 1000
		    Look in the statistical output of the log file to
		    see the maximal used length of this queue.
		    This you is build to have a stock of i/o hungry tasks.
--withUserGroupStat write statistics about used space in log file
--userGroupStatFile write statistics about used space in name file
		    will be overridden each time
			format is:
			identifier uid userName value
			identifier gid groupName value
--exceptSuffix	    do not compress or copy files with the following
		    suffix (uppercase included): (.zip, .bz2, .png, ....)
		    Call 'storeBackup.pl -h' to see the actual defaults.
		    With this option, you can replace the defaults. If you
		    choose '.*', no file will be compressed in the backup.
		    This makes it very easy for users to e.g. get files out of
		    the backup via a file browser (perhaps with samba).
--addExceptSuffix   like --exceptSuffix, but do not replace defaults, add
		    This option allows you to specify additional suffixes to
		    the exception list for compression
--compressMD5File   default is 'yes', if you do not want this, say 'no'
		    storeBackup stores an internal information file called
		    .md5CheckSums . Normally it's a good idea to compress it,
		    because it can grow relativly large and costs disk space.
		    If you have a lot of disk space, a slow cpu and often use
		    storeBackupVersion, you should not compress it.
--verbose	-v  verbose messages about --exceptPattern and --includePattern
--debug		-d  generate debug messages, levels are 0 (none, default),
		     1 (some), 2 (many) messages
--resetAtime	    reset access time in the source directory - but this will
		    change ctime (which is the time of the last modification
		     of file status information)
--doNotDelete	    test only, do not delete any backup
--keepAll	    keep backups which are not older than the specified amount
		    of time. This is like a default value for all days in
		    --keepWeekday. Begins deleting at the end of the script
		    the time range has to be specified in format 'dhms', e.g.
		       10d4h means 10 days and 4 hours
		    default = 30d;
--keepWeekday	    keep backups for the specified days for the specified
		    amount of time. Overwrites the default values choosen in
		    --keepAll. 'Mon,Wed:40d Sat:60d10m' means:
			keep backups of Mon and Wed 40days + 5mins
			keep backups of Sat 60days + 10mins
			keep backups of the rest of the days like spcified in
				--keepAll (default 30d)
		    if you also use the 'archive flag' it means to not
		    delete the affected directories via --keepMaxNumber:
		       a10d4h means 10 days and 4 hours and 'archive flag'
		    e.g. 'Mon,Wed:a40d Sat:60d10m' means:
			keep backups of Mon and Wed 40days + 5mins + 'archive'
			keep backups of Sat 60days + 10mins
			keep backups of the rest of the days like specified in
				--keepAll (default 30d)
--keepFirstOfYear   do not delete the first backup of a year
		    format is timePeriod with possible 'archive flag'
--keepLastOfYear    do not delete the last backup of a year
		    format is timePeriod with possible 'archive flag'
--keepFirstOfMonth  do not delete the first backup of a month
		    format is timePeriod with possible 'archive flag'
--keepLastOfMonth   do not delete the last backup of a month
		    format is timePeriod with possible 'archive flag'
--firstDayOfWeek    default: 'Sun'. This value is used for calculating
		    --keepFirstOfWeek and --keepLastOfWeek
--keepFirstOfWeek   do not delete the first backup of a week
		    format is timePeriod with possible 'archive flag'
--keepLastOfWeek    do not delete the last backup of a week
		    format is timePeriod with possible 'archive flag'
--keepDuplicate     keep multiple backups of one day up to timePeriod
		    format is timePeriod, 'archive flag' is not possible
--keepMinNumber	    Keep that miminum of backups. Multiple backups of one
		    day are counted as one backup.
--keepMaxNumber	    Try to keep only that maximum of backups. If you have more
		    backups, the following sequence of deleting will happen:
		    - delete all duplicates of a day, beginning with the old
		      once, except the oldest of every day
		    - if this is not enough, delete the rest of the backups
		      beginning with the oldest, but *never* a backup with
		      the 'archive flag' or the last backup
--progressReport     print progress report after each 'number' files
--ignoreReadError    ignore read errors in source directory; not readable
		     directories does not cause storeBackup.pl
		     to stop processing
		     normally, you should not use this option
--logFile	-l   log file (default is STDOUT)
--withTime	-w   output in logfile with time: 'yes' or 'no'
		     default = 'yes'
--maxFilelen	-m   maximal length of file, default = 1e6
--noOfOldFiles	-n   number of old log files, default = 5
--saveLogs	     save log files with date and time instead of deleting the
		     old (with [-noOldFiles]): 'yes' or 'no', default = 'no'
--compressWith	     compress saved log files (e.g. with 'gzip -9')
		     default is 'bzip2'
--logInBackupDir     write log file (also) in the backup directory:
		     'yes' or 'no', default is 'no'
		     Be aware that this log does not contain all error
		     messages of the one specified with --logFile!
		     Some errors are possible before the backup directory
		     is created.
--compressLogInBackupDir
		     compress the log file in the backup directory:
		     'yes' or 'no', default is 'yes'
--logInBackupDirFileName
		     filename to use for writing the above log file,
		     default is '$logInBackupDirFileName'


otherBackupDirs      Is a list parameter (like *.c for gcc).
		     List of other backup directories to consider for
		     hard linking. Format (examples):
		     /backupDir/2002.08.29_08.25.28 -> consider this backupDir
		     or
		     0:/backupDir    -> last (youngest) backup in /backupDir
		     1:/backupDir    -> first before last backup in /backupDir
		     n:/backupDir    -> n'th before last backup in /backupDir
		     3-5:/backupDir  -> 3rd, 4th and 5th in /backupDir
		     all:/backupDir  -> all in /backupDir
		     This option is respective useful, if you want to
		     hard link to backup trees from different
		     backups or if you want to be sure that a file exists
		     really only one time in the backup ('all').
		     If you make these backups in order (never
		     parallel), you should use 0:backupDir to the other
		     backup dirs. If it is possible, that they can run
		     in parallel, you should use 1:backupDir to avoid
		     useless copies in the backup. This option should
		     be used for *all* backups, which should share hard
		     links (vice versa). Naturally, all the backups
		     with joined hard links have to be in the same file
		     system!



INCLUDING / EXCLUDING FILES AND DIRECTORIES
-------------------------------------------

storeBackup has five parameters (beside --sourceDir) to manipulate
what files will go into the backup.

With followLinks you can control which (sub)directories will be saved
via symbolic links. See the EXAMPLES files for some explanations.

The other four parameters are _all_ examined _if_set_:

A file which is

not in 'exceptDirs' and
in 'includeDirs' and
does not match 'exceptPattern' (with full relative path) and
matches 'includePattern' (with full relative path)

will be saved! In all cases you have to define _relative_ paths from your
sourceDir! if you additionally use 'followLinks', interpret the
specified symbolic links as directories.

---
The parameters of exceptDirs and includeDirs are a list of
directories. You can use shell type wildcards (like home/*/.mozilla)
which will be expanded via a subshell from perl. If the result of your
wiledcard is very long, you might run into a limitation. If you have
many thousands include directories, the performance of storeBackup
will decrese. (This will not happen with excludeDirs.) In such a case
you should think about using includePattern.

---
The parameters of exceptPattern and includePattern are rules with
perl style pattern (regular expressions).

'exceptPattern' gives you the possibility to exclude a combination of
patterns. These patterns have to describe a file name with it's
relative path in the backup. You have to mask a '/' with '\/' inside
the pattern. If you are not familiar with perl pattern matching, you
should tye `man perlretut` and read some documentation. The
combination of patterns can be made with 'and', 'or', 'not, '(' or
')'. If you want to use one of the keywords as a pattern, it has to be
written differently, eg. write 'and' as 'a[n]d'. !!! '(' and ')'
_have_to_be_separated_ by white space!!!

You can say:

exceptPattern = ( \/opt\/ or \/optional\/ ) and not
\/[^/]+\/myproc\/

This means: Exclude the directories '/opt/' and '/optional/'. But do
not exclude all directories beginning with /*/myproc/ (this is the
same with wildcards). If here is defined a rule, and it matches, then
the file will not be saved.

If you simply want to exclude all .doc files from the backup, simply
define
exceptPattern = \.doc\Z


With 'excludePattern' and 'includePattern' you can define rules as
complicated as you want. Because it will be complicated to understand
these definitions, storeBackup gives you the possiblity of debugging:

debug = 0	no debugging output
debug = 1	will log for every file if it matches or not
debug = 2	will write _detailed_ logs for _every_ file.
		You will get lots of logging output. This debug level
		is useful if you have complicated rules of pattern.



STRATEGIES TO DELETE OLD BACKUPS
--------------------------------

storeBackup gives you a lot of possibilities to delete or not delete
your old backups. If you have a backup which should never be deleted,
the simplest way to achieve this is to rename its name, eg:

$ mv 2003.07.28_06.12.41 archive_2003.07.28_06.12.41

This is possible because storeBackup and storeBackupDel only delete
directories which match exactly the pattern YYYY.MM.DD_hh.mm.ss .

The most simple way to delete a specific directory is to use `rm -rf`.
If you want to delete backups which are too old depending on rules,
there are several options you can choose. You can specify the time to
keep old backups on the basis of weekdays (with a default value for
all weekdays in --keepAll which can be overwritten with
--keepWeekday). You can also specify to keep them with
--keepFirstOfYear, --keepLastOfYear, --keepFirstOfMonth and
--keepLastOfMonth. or with --keepFirstOfWeek and
--keepLastOfWeek where you can define the first weekday of your
definition of a week. In all of these cases, you have to specify a
time period. How to specify a time period is described in the
parameters section of this file.

Now imagine you are making your backups on an irregular basis, perhaps
from a laptop to a server or you make your backups when you think you
have finished an important step of your work. In such cases, it is
useful to say "only keep the last backup of a day in a long time
range" (with --keepDuplicate). If you were on holiday for a month
and have set --keepAll to '30d' (30 days), then you probably do not
want that storeBackup to delete all of your old backups when you start
it for the first time when you're back. You can avoid this with the
parameter --keepMinNumber. On the other hand, if you have limited
space on your backup disk, you want to limit the total number of
backups, for this, you can use --keepMaxNumber.

With --keepDuplicate you specify a time period in which storeBackup
keeps duplicate backups of a day. After this time period only the last
backup of a day will survive.

With --keepMinNumber you specify the minimal number of backups
storeBackup (or storeBackupDel) will *not* delete. The logic is as
follows:
- Do not delete backups specified with --keepAll ... --keepLastOfWeek and
  --keepDuplicate.
- If this is not enough, do not delete other ones beginning with the newest
  backups. Duplicates of a day are not affected by this parameter.

With --keepMaxNumber you specify the maximal number of
backups. StoreBackup will then delete the oldest backups if
necessary. To prevent special backups from deletition, you can specify
an "archive flag" with --keepAll ... --keepLastOfWeek. Backups
matching an archive flag will never be delete by --keepMaxNumber. In
this way it is possible that more backups will remain than specified
with this parameter, but the archive flag is useful to prevent special
backups like "last backup of a month" or "last backup of a week" to be
deleted.


If you are backing up to a filesystem via NFS, the speed will
obviously depend on your network connection when deleting old backups.
In this case, it can be useful to start storeBackupDel locally as a
daemon on that host.



MONITORING ----------

If you want to monitor storeBackup, simply grep for '^ERROR' and
possibly '^WARNING' in the log file.



LIMITATIONS
-----------

- storeBackup can backup normal files, directories, symbolic links and
  named pipes. Other file types are not supported and will generate a
  warning.

- The permissions in the backup tree(s) are equal to the permissions
  in the original directory. Under special rare conditions it is
  possible, that a user cannot read one ore more of own his/her files
  in the backup. With the restore tool - storeBackupRecover.pl -
  everything is restored with the original permissions.

- storeBackup uses hard links to save disk space. GNU/Linux with ext2
  file system supports up to 32000, reiserfs up to 64535 hard links. If
  storeBackup needs more hard links, it will write a warning and store
  a new (compressed) copy of the file. If you use ext2 for the backup,
  you have to reserve enough (static) inodes! (You will need one inode
  for each different file in the backup, *not* for every single hard link.)
