[Infrastructures] isconf deprecates infrastructures.org?

Steve Traugott stevegt@TerraLuna.Org
Mon, 18 Sep 2006 19:00:11 -0700


On Sun, Aug 13, 2006 at 12:48:20AM -0700, Mark Ferlatte wrote:
> Daniel Hagerty said on Sun, Aug 13, 2006 at 03:36:55AM -0400:
> >     It's a pretty standard problem for sysadmin tools in this space.
> > You'd have to detect what was done behind the tool's back and either
> > pretend the missing delta was performed by the tool, or undo what was
> > done outside the tool.  You're not going to get this kind of behavior
> > from the isconf model of how you do things.
> 
> Dang.  That's too bad.  I'd kind of like to be able to use isconf
> instead of the in-house system I'm using now (basically, systemimager's
> updateclient + cvsup to overlay configurations), but we use the "reset
> the system back to known baseline" functionality a lot.

Systemimager during reboot running from a miniroot is always going to
be a reliable rollback -- it's what I use with isconf.  But
systemimager's updateclient script is a different animal, since it
runs in the context of the machine it's modifying.  Updateclient is
not going to be reliable in those cases where it (or anything else)
modifies systemimager or any of its prereqs, such as rsync, perl,
libc, init scripts, the kernel, etc.  While this might be fine in
development environments, I don't use it in production.  Since I
always manage production and development machines the same way, this
means I don't use updateclient at all.

By the way, I just noticed a serious bug in updateclient; the rsync
command is missing the -H and -S flags.  I had the rsync guys fix this
in getimage and the miniroot years ago; looks like they never fixed it
in updateclient.  

So, when running updateclient, you are in fact guaranteed to *not* get
the same image that you started with -- hard links will be replaced
with copies of files, and sparse files will be bloated.  This is a
wonderful example of what I'm going on about -- you *can't* rely on
rollback code running within the context of the root filesystem to
somehow get you back to a prior disk state, since it will never be the
same code path that got you there in the first place.  In this case,
you installed via the miniroot, but you're rolling back via
updateclient, and they are not the same code at all.

> From reading more, it also seems like isconf assumes that your
> environment never changes?  

Ouch.  Isconf is all about change; continuous, rolling, chaotic
change.  It's also about testing, and reproducibility, and all of the
other things that people need for reliable builds, rebuilds, and
recoveries in a high-pressure, chaotic environment.  What do I need to
fix, in what documentation?  

> At least, there doesn't seem to be any way to "collapse" the journal
> into a new base image so that you don't have to replay the whole
> thing every time you image a new host.  

The journal is on the disk.  Install a machine, let the journal play,
then take a new systemimager snapshot of the resulting disk.  Use that
image for any new machines.  Since those transactions were *already*
played on that image, they won't play again.  

> In my case, our current images are 2+ years old (Debian sarge), and
> there have been a lot of things done to them in that period; having
> to replay 2 years of changes (security patching apache multiple
> times, etc) every time I want to install another rack of hosts
> doesn't seem like a good idea

We're going on 5+ years in our current infrastructure, hundreds of
changes in the journal; the image contains a mix of packages from
Debian potato, woody, sarge, sid, and many tarballs.  It's a complex
image, including AFS, kerberos, Xen, heartbeat, and EVMS.  At the
moment it takes a few minutes after install to replay the changes
since the last systemimager snapshot, including a Xen and kernel
upgrade, several initrd rebuilds, and a few upgrades of isconf itself.
I'll probably be taking a new systemimager snapshot pretty soon
regardless.

> especially if someone removes a CNAME that hasn't been used for two
> years but an early step in the journal depends on.

Ahhh.  If you run into that problem then you're doing something wrong;
either you're letting things leak into the journal that are really
environmental like IP addresses or user names, or you're not using
CNAMEs right (e.g. don't delete them).  But it's true that the
relationship between environmental and journalable data is a hard
problem, and no tool can keep you from shooting yourself in the
environmental foot.  Taking frequent systemimager snapshots does help
minimize this risk though.

Steve
-- 
Stephen G. Traugott  (KG6HDQ)
Managing Partner, TerraLuna LLC
stevegt@TerraLuna.Org -- http://www.t7a.org