[Infrastructures] a tapeless backups system for a workgroup (drafty proposal v1)
Will Partain
partain@dcs.gla.ac.uk
Fri, 04 Jan 2002 15:19:53 +0000
Greetings, infrastructural folks! Great to meet some of
you at LISA.
I have been musing slightly about doing backups without
tapes, and pass along my current notes (below) for your
comment and criticism.
(Once we get beyond the high-level issues that kinda make
sense for the infrastructures list, we should probably chat
elsewhere -- I'd suggest ark-dev
[http://lists.sf.net/mailman/listinfo/ark-dev], as that's
where I'll probably do any detailed implementation chit-chat
(if it happens).)
Will
=========================
A tapeless backups system for a workgroup
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
At a LISA 2001 session on "recovery-oriented computing" [1],
Dave Patterson (Berkeley) commented that "tape is dead".
Reason: if tape doesn't cost more per byte than disk yet, it
soon will. Quick check (2002.01): 120GB EIDE disk for $208
(Pricewatch) = $1.73/GB, 40GB DLT IV tape for $53
(Pricescan) = $1.32/GB. Close enough for me!
And, if you haven't noticed lately, tape is a *pain*.
So, what can we do backup-wise [2] (NB: not
`archiving'... [3]) using only cheap disks? Assumptions:
* I have a place "far away" (a fire won't get there...) on
my intranet where I can park a machine with disks hanging
off of it.
* My "workgroup" world isn't *that* big... Nor is it a
nothing-must-ever-be-down-ever-no-matter-what environment.
(Even in such an environment, where you will probably
RAIDify every disk-bit, you still need backups...)
Terminology:
`disk chunk' [4] - a logically-coherent chunk
of stuff on disk "in one piece" (i.e. under a single
directory). Examples: "Fred's home directory" (/home/fred);
"where we keep our source tarballs" (/system/open-src); ...
[If you have a record of all of your 'disk chunks', you
can generate your automount maps (for example).]
`replica' - a copy of a disk chunk. The `primary' replica
is the one you really use (perhaps read-write); all other
(`secondary') replicas of a disk chunk are read-only (if
made available at all).
So, the idea is very simple: rsync [5] copies of your disk
chunks to a remote diskful machine in an organized way.
Those remote chunks (replicas) are then available themselves
(read-only). Details:
* A five-nights-a-week thing rsyncs all live 'disk chunks'
to a remote diskful box. This set of (secondary) replicas
are available (read-only) through a /yesterday automount
map, using the following mapping:
primary replica called... secondary called...
/home/partain /yesterday/home--partain
/system/open-src /yesterday/system--open-src
(and so on)
* A weekly (every Sunday?) thing rsyncs all live disk chunks
to a remote diskful box. These sets of (secondary)
replicas are available (read-only) through automount maps
/weekend-1, /weekend-2, /weekend-3, and /weekend-4. (If
there's a fifth Sunday, I think I just won't bother.)
Thus, my home directory at the first weekend of the
month: /weekend-1/home--partain
* System-y bits (/usr, /, /opt) -- stuff that came straight
from the vendor -- are not backed up. (You could do
differently, but we're inclined to just rebuild the
machine.)
That's, er, um, *it*! Use cases:
* User loses a file? They can go find it themselves, in
/yesterday and/or /weekend-{1,2,3,4}.
* Data disk dies? Either: (a) Copy the stuff from
/yesterday to some other disk, creating a new primary
replica; adjust automount maps accordingly; or (b) Switch
affected people's machines' automount maps to point to the
/yesterday replica (now read-write :-). Replace disk at
leisure and shuffle things about.
* System disk dies? (machine now dead) (a) [unlikely]
Physical move of the data disk to another machine; adjust,
season, enjoy; or (b) For all the primary disk-chunk
replicas that were on this machine, as for the data-disk
case. Fix machine at leisure. Note: for critical
machines, it may be worth having a spare system disk ready
to go.
* Building burns down? You've got all the bits far away.
Finer details:
* rsync over ssh: yes
* It's better that the "diskful boxes" *pull* the rsync
copies, rather than having every client push to them.
* Yes, we'll use the rsync 'excludes' mechanism to avoid
backing up large recreatable files.
* Database-y disk chunks (e.g. Oracle databases, ClearCase
VOBs, ...): have enough smarts to wrap the rsyncs in
a `lock'/`unlock' pair.
* Can 'cut over' from a traditional tape-based backup scheme
gradually (a disk or machine at a time...).
* Need some kind of tidy-up mechanism: e.g. if a /weekend-1
replica didn't get made correctly on the Sunday, perhaps
should re-try on the Monday? (Or just live with a
not-quite-right /weekend-1 replica?)
Variants:
* Have the nightly rsync go to a *local* diskful box; it is
then really quite painless to switch a user/system to that copy
(no WAN delays).
The remote one could then be /day-before-yesterday, taken
from /yesterday before it gets
refreshed... (Synchronization issues there...)
* If you just can't get over that tape feeling, attach a
tape drive to the remote diskful boxes, and take/keep
copies of the /weekend-1 replicas.
Comments?? What have I forgotten/overlooked?
Notes:
[1] http://roc.cs.berkeley.edu/
[2] `Backing up' solves problems ranging from `I deleted my
cookie recipe' to `The building burned down'.
[3] `Archiving' takes copies of logical entities (`the
project that just finished', `the month-end accounts')
to preserve "forever", perhaps for legal reasons.
[4] `Disk chunk' (or 'dchunk' [pronounced "duh-*chunk*"]) is
an Arusha Project (Sidai team) term.
[5] http://rsync.samba.org/
== end