From tubaman@fattuba.com Mon Sep 11 23:37:27 2006 From: tubaman@fattuba.com (Ryan Nowakowski) Date: Mon, 11 Sep 2006 17:37:27 -0500 Subject: [Infrastructures] isconf v4 problem Message-ID: <20060911223727.GE27570@fattuba.com> --a8Wt8u1KmwUX3Y2C Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hey Folks, I'm trying to get isconf v4 up and running. Following the directions in the README, I install and then do a "isconf start; isconf up" and get this error: isconf: info: myhost is on generic branch isconf: info: checking for updates isconf: warning: clierr: Connection reset by peer Here's the /tmp/isconf.stderr kernel: Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/isconf/Kernel.py", line 427, in step argv = obj.next() File "/usr/lib/python2.3/site-packages/isconf/Cache.py", line 191, in puller self.resend() File "/usr/lib/python2.3/site-packages/isconf/Cache.py", line 217, in resend self.bcast(str(req)) File "/usr/lib/python2.3/site-packages/isconf/Cache.py", line 116, in bcast self.sock.sendto(msg,0,(addr,self.udpport)) error: (101, 'Network is unreachable') ...and the last few lines of /tmp/isconf.log 1158013812.363447 debug: journal abshist /var/is/conf/history 1158013812.363475 debug: lock abspath /var/is/fs/cache/layern.com/volume/generic /lock 1158013812.363502 debug: blockabs /var/is/fs/cache/layern.com/block 1158013812.363530 info: thing0-1 is on generic branch 1158013812.363558 debug: process calling up 1158013812.363585 info: checking for updates Let me know if you guys need to see more. Thanks, Ryan --a8Wt8u1KmwUX3Y2C Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFFBeUn6ZA8+1/wXqMRAs8BAKCKJNIxJ+O096YYKyi9wNh/lc7UuQCglz6V EkC4BHRv7YcSQ+zA8VckGN8= =swgY -----END PGP SIGNATURE----- --a8Wt8u1KmwUX3Y2C-- From stevegt@TerraLuna.Org Tue Sep 19 03:00:11 2006 From: stevegt@TerraLuna.Org (Steve Traugott) Date: Mon, 18 Sep 2006 19:00:11 -0700 Subject: [Infrastructures] isconf deprecates infrastructures.org? In-Reply-To: <20060813074820.GA19364@cryptio.net> References: <2aa4b130607310657g67ce2472ke609f3b995531c74@mail.gmail.com> <20060813050737.GA1618@cryptio.net> <17630.54935.362858.867900@perdition.linnaean.org> <20060813074820.GA19364@cryptio.net> Message-ID: <20060919020011.GD29633@terraluna.org> On Sun, Aug 13, 2006 at 12:48:20AM -0700, Mark Ferlatte wrote: > Daniel Hagerty said on Sun, Aug 13, 2006 at 03:36:55AM -0400: > > It's a pretty standard problem for sysadmin tools in this space. > > You'd have to detect what was done behind the tool's back and either > > pretend the missing delta was performed by the tool, or undo what was > > done outside the tool. You're not going to get this kind of behavior > > from the isconf model of how you do things. > > Dang. That's too bad. I'd kind of like to be able to use isconf > instead of the in-house system I'm using now (basically, systemimager's > updateclient + cvsup to overlay configurations), but we use the "reset > the system back to known baseline" functionality a lot. Systemimager during reboot running from a miniroot is always going to be a reliable rollback -- it's what I use with isconf. But systemimager's updateclient script is a different animal, since it runs in the context of the machine it's modifying. Updateclient is not going to be reliable in those cases where it (or anything else) modifies systemimager or any of its prereqs, such as rsync, perl, libc, init scripts, the kernel, etc. While this might be fine in development environments, I don't use it in production. Since I always manage production and development machines the same way, this means I don't use updateclient at all. By the way, I just noticed a serious bug in updateclient; the rsync command is missing the -H and -S flags. I had the rsync guys fix this in getimage and the miniroot years ago; looks like they never fixed it in updateclient. So, when running updateclient, you are in fact guaranteed to *not* get the same image that you started with -- hard links will be replaced with copies of files, and sparse files will be bloated. This is a wonderful example of what I'm going on about -- you *can't* rely on rollback code running within the context of the root filesystem to somehow get you back to a prior disk state, since it will never be the same code path that got you there in the first place. In this case, you installed via the miniroot, but you're rolling back via updateclient, and they are not the same code at all. > From reading more, it also seems like isconf assumes that your > environment never changes? Ouch. Isconf is all about change; continuous, rolling, chaotic change. It's also about testing, and reproducibility, and all of the other things that people need for reliable builds, rebuilds, and recoveries in a high-pressure, chaotic environment. What do I need to fix, in what documentation? > At least, there doesn't seem to be any way to "collapse" the journal > into a new base image so that you don't have to replay the whole > thing every time you image a new host. The journal is on the disk. Install a machine, let the journal play, then take a new systemimager snapshot of the resulting disk. Use that image for any new machines. Since those transactions were *already* played on that image, they won't play again. > In my case, our current images are 2+ years old (Debian sarge), and > there have been a lot of things done to them in that period; having > to replay 2 years of changes (security patching apache multiple > times, etc) every time I want to install another rack of hosts > doesn't seem like a good idea We're going on 5+ years in our current infrastructure, hundreds of changes in the journal; the image contains a mix of packages from Debian potato, woody, sarge, sid, and many tarballs. It's a complex image, including AFS, kerberos, Xen, heartbeat, and EVMS. At the moment it takes a few minutes after install to replay the changes since the last systemimager snapshot, including a Xen and kernel upgrade, several initrd rebuilds, and a few upgrades of isconf itself. I'll probably be taking a new systemimager snapshot pretty soon regardless. > especially if someone removes a CNAME that hasn't been used for two > years but an early step in the journal depends on. Ahhh. If you run into that problem then you're doing something wrong; either you're letting things leak into the journal that are really environmental like IP addresses or user names, or you're not using CNAMEs right (e.g. don't delete them). But it's true that the relationship between environmental and journalable data is a hard problem, and no tool can keep you from shooting yourself in the environmental foot. Taking frequent systemimager snapshots does help minimize this risk though. Steve -- Stephen G. Traugott (KG6HDQ) Managing Partner, TerraLuna LLC stevegt@TerraLuna.Org -- http://www.t7a.org From stevegt@TerraLuna.Org Tue Sep 19 00:03:49 2006 From: stevegt@TerraLuna.Org (Steve Traugott) Date: Mon, 18 Sep 2006 16:03:49 -0700 Subject: [Infrastructures] state machines In-Reply-To: <17630.54935.362858.867900@perdition.linnaean.org> References: <2aa4b130607310657g67ce2472ke609f3b995531c74@mail.gmail.com> <20060813050737.GA1618@cryptio.net> <17630.54935.362858.867900@perdition.linnaean.org> Message-ID: <20060918230349.GC29633@terraluna.org> On Sun, Aug 13, 2006 at 03:36:55AM -0400, Daniel Hagerty wrote: > > In fact, it seems that isconf will blow up if anybody forgets to make > > changes using isconf at all (vs. restoring the machine to the known good > > state). > > > > Am I missing something? > > No, you aren't. > > It's a pretty standard problem for sysadmin tools in this space. Dan, there's got to be some general way of saying this; I think while either lambda calculus or turing machines can *illustrate* it, they still don't say *why*. I'm starting to think that a closer explanation might be something like: A machine can be described as a directed graph. The disk states are nodes, the changes are edges. You can't go backwards along an edge -- there can be no "undo". You might theoretically be able to reach a prior node by some other path, but there is no general solution for generating the code that implements those reverse edges. Each edge transition -- in any direction -- must be individually tested to verify that is has reached the desired node. In the case of normal transistions, this is known as "testing before production rollout". In the case of reverse transitions, the resulting disk state must inspected to ensure that it is indeed the prior state, and not some new node in the directed graph. Creating the transition code for each reverse edge, and performing the inspection to ensure that the code re-creates the prior state, will always be more expensive than just hitting the big "reset" button and rebuilding the machine back to the starting state, then replaying the forward edges until the desired node is reached. There --- the entire Turing Equivalence paper in one long paragraph. ;-) And I like the fact that this is starting to sound a lot like plain old ordinary state machines. Steve -- Stephen G. Traugott (KG6HDQ) Managing Partner, TerraLuna LLC stevegt@TerraLuna.Org -- http://www.t7a.org From stevegt@TerraLuna.Org Sat Sep 16 07:11:32 2006 From: stevegt@TerraLuna.Org (Steve Traugott) Date: Fri, 15 Sep 2006 23:11:32 -0700 Subject: [Infrastructures] isconf v4 problem In-Reply-To: <20060911223727.GE27570@fattuba.com> References: <20060911223727.GE27570@fattuba.com> Message-ID: <20060916061132.GA29633@terraluna.org> Hi Ryan! Sorry for the delay... On Mon, Sep 11, 2006 at 05:37:27PM -0500, Ryan Nowakowski wrote: > kernel: Traceback (most recent call last): > File "/usr/lib/python2.3/site-packages/isconf/Kernel.py", line 427, in step > argv = obj.next() > File "/usr/lib/python2.3/site-packages/isconf/Cache.py", line 191, in puller > self.resend() > File "/usr/lib/python2.3/site-packages/isconf/Cache.py", line 217, in resend > self.bcast(str(req)) > File "/usr/lib/python2.3/site-packages/isconf/Cache.py", line 116, in bcast > self.sock.sendto(msg,0,(addr,self.udpport)) > error: (101, 'Network is unreachable') What does main.cf look like, and what's in your nets file, if any? If there's nothing funny in your main.cf or nets files, then I'd be curious to know if you get the same error using python 2.4. Otherwise, this looks like another in a class of problems related to bugs #63 and 66. That UDP broadcast code in Cache.py was a temporary workaround which has way outlived its usefulness. I'm thinking of replacing the UDP broadcasts with an ssh-based mesh -- this would require that people manage ssh keys, authorized_keys, and a local isconf user in /etc/password on each machine. This is instead of the TCP mesh I haven't had time to write; in the latter, people would have had to manage PGP keys and/or HMAC secrets anyway -- in retrospect, ssh is probably simpler and better understood both operationally and security-wise. I'm probably going to hit my limit of frustration and rewrite Cache.py within the next month, probably using ssh, and I'd be interested in what people think one way or the other. Steve -- Stephen G. Traugott (KG6HDQ) Managing Partner, TerraLuna LLC stevegt@TerraLuna.Org -- http://www.t7a.org From stevegt@TerraLuna.Org Tue Sep 19 04:42:30 2006 From: stevegt@TerraLuna.Org (Steve Traugott) Date: Mon, 18 Sep 2006 20:42:30 -0700 Subject: [Infrastructures] Re: Re: isconf deprecates infrastructures.org? In-Reply-To: <20060817214218.GE15711@hezmatt.org> References: <2aa4b130607310657g67ce2472ke609f3b995531c74@mail.gmail.com> <20060814060808.GL14094@terraluna.org> <77abe410608170316s64232738w27c5d19e5a02819d@mail.gmail.com> <9d03aa20608170738m3cecdee4jae8c8a6b846089b@mail.gmail.com> <20060817214218.GE15711@hezmatt.org> Message-ID: <20060919034230.GF29633@terraluna.org> On Fri, Aug 18, 2006 at 07:42:18AM +1000, Matthew Palmer wrote: > On Thu, Aug 17, 2006 at 08:38:17AM -0600, Jordan Curzon wrote: > > This might be interesting for you, although it is not isconf stuff. > > > > These are three scripts that I use to build my xen guests with Ubuntu. > > create-rootfs.sh makes a base image and prepare-xen.sh copies that to > > an LVM disk and gives the machine it's own identity (SSH and > > hostname). setup-dev.sh is an example of the setup script that gets > > run to provision a server for a specific role. setup-xen.sh is a > > script that will setup ubuntu as a xen dom0 host. > > There's Steve Kemp's xen-tools package, too, which does the same thing, in > what is probably a more generalised manner. It's cross-distro, now, too. Kris Buytaert also did a writeup on using systemimager for Xen guest images: http://howto.x-tend.be/AutomatingVirtualMachineDeployment/ Steve -- Stephen G. Traugott (KG6HDQ) Managing Partner, TerraLuna LLC stevegt@TerraLuna.Org -- http://www.t7a.org From stevegt@TerraLuna.Org Mon Sep 18 23:31:01 2006 From: stevegt@TerraLuna.Org (Steve Traugott) Date: Mon, 18 Sep 2006 15:31:01 -0700 Subject: [Infrastructures] isconf deprecates infrastructures.org? In-Reply-To: <20060813050737.GA1618@cryptio.net> References: <2aa4b130607310657g67ce2472ke609f3b995531c74@mail.gmail.com> <20060813050737.GA1618@cryptio.net> Message-ID: <20060918223100.GB29633@terraluna.org> (*Way* behind on my mail...) Looks like everyone else covered this thread pretty well; just wanted to inject a few more points: On Sat, Aug 12, 2006 at 10:07:37PM -0700, Mark Ferlatte wrote: > Let say that I have an environment with a bunch of developers, and those > developers, for example, have root on a set of machines. Being > developers, they want to be able to install things temporarily "as > needed", and I want to be able to restore the machines back to baseline > quickly and easily. People always want to be able to roll back to a previous baseline while running within the context of the root filesystem that they're modifying. As a sysadmin, you quite frequently do rollbacks, for instance by running a package uninstaller. This works most of the time, and this leads you to believe that a general rollback tool is possible. It's a myth. The changes you've made are irreversible -- you can't run the code backwards; you can only run other code that claims to be able to undo those changes. The "undo" outcome can't be predicted computationally, so a general-purpose tool that can do this reliably can not be written. Any general rollback tool has to run outside the context of the root filesystem being modified. Use systemimager, running in a miniroot during reboot, if you want to get outside the root filesystem and do clean rollbacks. Then use isconf to add the last few deltas on since the last image snapshot. Steve -- Stephen G. Traugott (KG6HDQ) Managing Partner, TerraLuna LLC stevegt@TerraLuna.Org -- http://www.t7a.org From stevegt@TerraLuna.Org Tue Sep 19 03:20:04 2006 From: stevegt@TerraLuna.Org (Steve Traugott) Date: Mon, 18 Sep 2006 19:20:04 -0700 Subject: [Infrastructures] isconf deprecates infrastructures.org? In-Reply-To: <1155497959.9416.32.camel@spartacus.nakedape.priv> References: <2aa4b130607310657g67ce2472ke609f3b995531c74@mail.gmail.com> <20060813050737.GA1618@cryptio.net> <17630.54935.362858.867900@perdition.linnaean.org> <1155497959.9416.32.camel@spartacus.nakedape.priv> Message-ID: <20060919022004.GE29633@terraluna.org> On Sun, Aug 13, 2006 at 12:39:19PM -0700, Wil Cooley wrote: > This has long been a complaint of mine with the tools I've looked at. > I'd really like to be able to inform the tool, make local changes, then > check my changes back into the central repository, easily and without a > lot of fuss. Because, invariably, it takes more than one try to get a > particular configuration right and the iteration of "change in repo, > manually run tool to update host, reload server, see if it worked" is > frustratingly long--even if it's only 30 seconds or so. The whole point of isconf 4 is to get rid of this cycle. There is no longer any gold server; no central repository. You just do something like this on one of the machines you want to change: # lock isconf on all hosts so nobody else can make changes isconf -m "upgrade mutt" lock # take a snapshot of the new mutt package isconf snap /tmp/mutt_1.5.9-2_i386.deb # install it isconf exec dpkg -i /tmp/mutt_1.5.9-2_i386.deb # check it in, unlocking other hosts isconf ci ...then do this on other hosts to update them (I also put this in rc, and sometimes in cron): # replay the above 'snap' and exec', as well as anything else queued up isconf up More details in the man page: http://trac.t7a.org/isconf/pub/doc/latest/isconf.html Steve -- Stephen G. Traugott (KG6HDQ) Managing Partner, TerraLuna LLC stevegt@TerraLuna.Org -- http://www.t7a.org From Daniel Hagerty Tue Sep 19 10:53:07 2006 From: Daniel Hagerty (Daniel Hagerty) Date: Tue, 19 Sep 2006 05:53:07 -0400 Subject: [Infrastructures] state machines In-Reply-To: <20060918230349.GC29633@terraluna.org> References: <2aa4b130607310657g67ce2472ke609f3b995531c74@mail.gmail.com> <20060813050737.GA1618@cryptio.net> <17630.54935.362858.867900@perdition.linnaean.org> <20060918230349.GC29633@terraluna.org> Message-ID: <17679.48643.527431.326403@perdition.linnaean.org> I'm supposed to be avoiding thought at present. Shame on you! > Dan, there's got to be some general way of saying this; I think while > either lambda calculus or turing machines can *illustrate* it, they > still don't say *why*. Turing machines aren't actually used for anything outside of "such and such is provably turing equivelent, and this, that, and the other theorem have been proven w.r.t. turing machines". Lambda calculus is provably turing equivelent (surprise), and more expressive to the point that several programming languages (e.g. scheme, ML, haskell) are thinly veiled lambda calculus. It's what's most commonly used for real work of this sort. As to the question at hand, you probably aren't going to get the illustration you want. The problem is one of practicality, rather than actual mathematical intractability. The pie in the sky "what we want" won't be practical for some time, as opposed to being impossible. Counter examples exist, if you look for the right thing. Haskell is the most direct citation that leaps to mind. It's a purely functional language (no first order side effects), and yet: * It can express non-terminating programs * It's compilable in sub-infinite time * Programs can behave in a non-functional fashion, even though they can't be written in anything other than a functional form. (none of these are surprising properties) In particular, it's worth noting the resemblance of their problems to what we want: * A haskell compiler has to process a declaration of a program, and produces imperative instructions that mutate the state of a machine forward in time so that the machine behaves as the program states. As usual, the real problem here is having the programmer understand the meaning of what they wrote. * Haskell debuggers tend to work backwards in time; you often work backwards from the result that you don't like to discover how behavior disagrees with what you meant. This forward and backward in time thing should sound familiar. I could probably come up with other related bits, but hopefully the example above is a good one without going too far afield. As for system administration configuration management itself: If all we want to do is declare the state of the system and produce the correct statements to do and undo things in known state, that's not terribly hard -- the testing isn't really any different than the more general problem that any software developer faces (and yes, this is a handwave, in that most software testing isn't up to the level of required rigor). People are able to do this now when they cut the problem up and declare certain things out of scope. If we're looking to work from unknown state (in other words, step 1 being discover the current state), the problem varies between obscenely hard, and intractible. There's a lot more area on the "obscenely hard (to the point of impractical)" side of the curve than the intractible side as far as any "practical" use is concerned. "rm -rf /; reboot; rollback" falls closer to the intractible side. There's more handwaving in here that requires proper qualifications. Anyway, discuss. I know you disagree with me on the substance of this. Varying degrees of proof take more work and are more likely to come out of being called out on my BS. It's past bedtime for bozo. From wes@umich.edu Tue Sep 19 14:44:12 2006 From: wes@umich.edu (Wesley Craig) Date: Tue, 19 Sep 2006 09:44:12 -0400 Subject: [Infrastructures] state machines In-Reply-To: <20060918230349.GC29633@terraluna.org> References: <2aa4b130607310657g67ce2472ke609f3b995531c74@mail.gmail.com> <20060813050737.GA1618@cryptio.net> <17630.54935.362858.867900@perdition.linnaean.org> <20060918230349.GC29633@terraluna.org> Message-ID: <78A6BEFD-D26F-4BED-8257-5CB44CB100FF@umich.edu> Stated so generally, I think I can come up with counter examples. For instance, if you make a full disk image before pushing out a change, you can in fact "undo" the change, by restoring the backup. :wes On 18 Sep 2006, at 19:03, Steve Traugott wrote: > A machine can be described as a directed graph. The disk states > are nodes, the changes are edges. You can't go backwards along an > edge -- there can be no "undo". You might theoretically be able > to reach a prior node by some other path, but there is no general > solution for generating the code that implements those reverse > edges. Each edge transition -- in any direction -- must be > individually tested to verify that is has reached the desired > node. In the case of normal transistions, this is known as > "testing before production rollout". In the case of reverse > transitions, the resulting disk state must inspected to ensure > that it is indeed the prior state, and not some new node in the > directed graph. Creating the transition code for each reverse > edge, and performing the inspection to ensure that the code > re-creates the prior state, will always be more expensive than > just hitting the big "reset" button and rebuilding the machine > back to the starting state, then replaying the forward edges until > the desired node is reached. From allbery@ece.cmu.edu Tue Sep 19 16:18:24 2006 From: allbery@ece.cmu.edu (Brandon S. Allbery KF8NH) Date: Tue, 19 Sep 2006 11:18:24 -0400 Subject: [Infrastructures] state machines In-Reply-To: <78A6BEFD-D26F-4BED-8257-5CB44CB100FF@umich.edu> References: <2aa4b130607310657g67ce2472ke609f3b995531c74@mail.gmail.com> <20060813050737.GA1618@cryptio.net> <17630.54935.362858.867900@perdition.linnaean.org> <20060918230349.GC29633@terraluna.org> <78A6BEFD-D26F-4BED-8257-5CB44CB100FF@umich.edu> Message-ID: On Sep 19, 2006, at 9:44 AM, Wesley Craig wrote: > Stated so generally, I think I can come up with counter examples. > For instance, if you make a full disk image before pushing out a > change, you can in fact "undo" the change, by restoring the backup. ...unless said change affects something other than disk --- consider PC BIOS, or more significantly the SPARC/PPC "eeprom" command. -- brandon s. allbery [linux,solaris,freebsd,perl] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH From u+infra-terraluna-jmto@chalmers.se Tue Sep 19 16:36:02 2006 From: u+infra-terraluna-jmto@chalmers.se (u+infra-terraluna-jmto@chalmers.se) Date: Tue, 19 Sep 2006 17:36:02 +0200 Subject: [Infrastructures] state machines In-Reply-To: <78A6BEFD-D26F-4BED-8257-5CB44CB100FF@umich.edu> References: <2aa4b130607310657g67ce2472ke609f3b995531c74@mail.gmail.com> <20060813050737.GA1618@cryptio.net> <17630.54935.362858.867900@perdition.linnaean.org> <20060918230349.GC29633@terraluna.org> <78A6BEFD-D26F-4BED-8257-5CB44CB100FF@umich.edu> Message-ID: <20060919153602.GB7721@hambo.tekno.chalmers.se> Wesley, On Tue, Sep 19, 2006 at 09:44:12AM -0400, Wesley Craig wrote: > Stated so generally, I think I can come up with counter examples. > For instance, if you make a full disk image before pushing out a > change, you can in fact "undo" the change, by restoring the backup. what you propose is reserving a lot of space for rollbacks, which is going to be expensive, and yet a change which erases the code for reading the saved image will be irreversible. Of course testing and debugging the reversal for each change makes it _possible_ but that does not seem to be the point, if we look at the original statement, which you happened to cite: > On 18 Sep 2006, at 19:03, Steve Traugott wrote: > > ... Creating the transition code for each reverse > > edge, and performing the inspection to ensure that the code > > re-creates the prior state, will always be more expensive than > > just hitting the big "reset" button and rebuilding the machine Rune. From wes@umich.edu Tue Sep 19 16:40:01 2006 From: wes@umich.edu (Wesley Craig) Date: Tue, 19 Sep 2006 11:40:01 -0400 Subject: [Infrastructures] state machines In-Reply-To: References: <2aa4b130607310657g67ce2472ke609f3b995531c74@mail.gmail.com> <20060813050737.GA1618@cryptio.net> <17630.54935.362858.867900@perdition.linnaean.org> <20060918230349.GC29633@terraluna.org> <78A6BEFD-D26F-4BED-8257-5CB44CB100FF@umich.edu> Message-ID: <1C5934EE-7AA1-4BC9-9CE1-C0FF0394F221@umich.edu> On 19 Sep 2006, at 11:18, Brandon S. Allbery KF8NH wrote: > On Sep 19, 2006, at 9:44 AM, Wesley Craig wrote: >> Stated so generally, I think I can come up with counter examples. >> For instance, if you make a full disk image before pushing out a >> change, you can in fact "undo" the change, by restoring the backup. > > ...unless said change affects something other than disk --- > consider PC BIOS, or more significantly the SPARC/PPC "eeprom" > command. Sure. Think you can come up with a solution for that situation? State machines are just that. If you are able to record the state, you can restart. It's that simple. :wes From allbery@ece.cmu.edu Tue Sep 19 16:44:58 2006 From: allbery@ece.cmu.edu (Brandon S. Allbery KF8NH) Date: Tue, 19 Sep 2006 11:44:58 -0400 Subject: [Infrastructures] state machines In-Reply-To: <1C5934EE-7AA1-4BC9-9CE1-C0FF0394F221@umich.edu> References: <2aa4b130607310657g67ce2472ke609f3b995531c74@mail.gmail.com> <20060813050737.GA1618@cryptio.net> <17630.54935.362858.867900@perdition.linnaean.org> <20060918230349.GC29633@terraluna.org> <78A6BEFD-D26F-4BED-8257-5CB44CB100FF@umich.edu> <1C5934EE-7AA1-4BC9-9CE1-C0FF0394F221@umich.edu> Message-ID: <7BDB6B4D-D3E0-48E0-AE84-6557E6C77F43@ece.cmu.edu> On Sep 19, 2006, at 11:40 AM, Wesley Craig wrote: > On 19 Sep 2006, at 11:18, Brandon S. Allbery KF8NH wrote: >> On Sep 19, 2006, at 9:44 AM, Wesley Craig wrote: >>> Stated so generally, I think I can come up with counter >>> examples. For instance, if you make a full disk image before >>> pushing out a change, you can in fact "undo" the change, by >>> restoring the backup. >> >> ...unless said change affects something other than disk --- >> consider PC BIOS, or more significantly the SPARC/PPC "eeprom" >> command. > > Sure. Think you can come up with a solution for that situation? > State machines are just that. If you are able to record the state, > you can restart. It's that simple. Sure --- assuming you know all of the state that is ever affected by any change. Which is in some sense the fundamental issue here; I do *not* reliably know everything that e.g. Cadence installs will affect, and once or twice we've been caught by surprise as a result. State machines are only useful when *all* possible states are known beforehand. -- brandon s. allbery [linux,solaris,freebsd,perl] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH From wes@umich.edu Tue Sep 19 16:59:36 2006 From: wes@umich.edu (Wesley Craig) Date: Tue, 19 Sep 2006 11:59:36 -0400 Subject: [Infrastructures] state machines In-Reply-To: <7BDB6B4D-D3E0-48E0-AE84-6557E6C77F43@ece.cmu.edu> References: <2aa4b130607310657g67ce2472ke609f3b995531c74@mail.gmail.com> <20060813050737.GA1618@cryptio.net> <17630.54935.362858.867900@perdition.linnaean.org> <20060918230349.GC29633@terraluna.org> <78A6BEFD-D26F-4BED-8257-5CB44CB100FF@umich.edu> <1C5934EE-7AA1-4BC9-9CE1-C0FF0394F221@umich.edu> <7BDB6B4D-D3E0-48E0-AE84-6557E6C77F43@ece.cmu.edu> Message-ID: <56842C9F-A4E0-4B63-9EA8-CD26B9C94C8F@umich.edu> On 19 Sep 2006, at 11:44, Brandon S. Allbery KF8NH wrote: > Sure --- assuming you know all of the state that is ever affected > by any change. Which is in some sense the fundamental issue here; > I do *not* reliably know everything that e.g. Cadence installs will > affect, and once or twice we've been caught by surprise as a > result. State machines are only useful when *all* possible states > are known beforehand. Oh, you want to talk about practicalities? I thought we were talking about theorems & proofs. If you want to talk about practicalities, Do you think Cadence is reprogramming the firmware? Let's assume for the moment that it only installs file in the filesystem. Do you agree that it's *possible* for you to know everything that Cadence install has affected? :wes From allbery@ece.cmu.edu Tue Sep 19 17:14:44 2006 From: allbery@ece.cmu.edu (Brandon S. Allbery KF8NH) Date: Tue, 19 Sep 2006 12:14:44 -0400 Subject: [Infrastructures] state machines In-Reply-To: <56842C9F-A4E0-4B63-9EA8-CD26B9C94C8F@umich.edu> References: <2aa4b130607310657g67ce2472ke609f3b995531c74@mail.gmail.com> <20060813050737.GA1618@cryptio.net> <17630.54935.362858.867900@perdition.linnaean.org> <20060918230349.GC29633@terraluna.org> <78A6BEFD-D26F-4BED-8257-5CB44CB100FF@umich.edu> <1C5934EE-7AA1-4BC9-9CE1-C0FF0394F221@umich.edu> <7BDB6B4D-D3E0-48E0-AE84-6557E6C77F43@ece.cmu.edu> <56842C9F-A4E0-4B63-9EA8-CD26B9C94C8F@umich.edu> Message-ID: <4A7F651A-1C82-4BE0-B034-5028AE0077F4@ece.cmu.edu> On Sep 19, 2006, at 11:59 AM, Wesley Craig wrote: > On 19 Sep 2006, at 11:44, Brandon S. Allbery KF8NH wrote: >> Sure --- assuming you know all of the state that is ever affected >> by any change. Which is in some sense the fundamental issue here; >> I do *not* reliably know everything that e.g. Cadence installs >> will affect, and once or twice we've been caught by surprise as a >> result. State machines are only useful when *all* possible states >> are known beforehand. > > Oh, you want to talk about practicalities? I thought we were > talking about theorems & proofs. If practicalities disagree with the theory, then something's wrong with the theory. In this case, the theory is that we can know every modification made to a system by any given action --- to which I must first ask "at what level?" Clearly it's false at the quantum level, question being whether that is relevant. Unfortunately, I can imagine cases where it *is* at least in part relevant, and in those cases you have a significant problem. When it comes down to it, your thesis relies on the answers to: (a) do you know all the levels at which any possible action can modify the system? (b) can you reliably record *and later restore* the state at *all* of those levels? (keeping in mind that this may require actions to be performed in a particular order, so simply thwacking the eeprom after doing your disk restore might not completely restore the state if the eeprom controls something that can affect the restore....) You can do this with full machine virtualization, and perhaps someday that will be a best practice. Otherwise, unless you've carefully inspected and dissected *everything* that touches your system, it's not clear to me that you can say yes to both of the above questions. -- brandon s. allbery [linux,solaris,freebsd,perl] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH From Daniel Hagerty Tue Sep 19 20:00:11 2006 From: Daniel Hagerty (Daniel Hagerty) Date: Tue, 19 Sep 2006 15:00:11 -0400 Subject: [Infrastructures] state machines In-Reply-To: <4A7F651A-1C82-4BE0-B034-5028AE0077F4@ece.cmu.edu> References: <2aa4b130607310657g67ce2472ke609f3b995531c74@mail.gmail.com> <20060813050737.GA1618@cryptio.net> <17630.54935.362858.867900@perdition.linnaean.org> <20060918230349.GC29633@terraluna.org> <78A6BEFD-D26F-4BED-8257-5CB44CB100FF@umich.edu> <1C5934EE-7AA1-4BC9-9CE1-C0FF0394F221@umich.edu> <7BDB6B4D-D3E0-48E0-AE84-6557E6C77F43@ece.cmu.edu> <56842C9F-A4E0-4B63-9EA8-CD26B9C94C8F@umich.edu> <4A7F651A-1C82-4BE0-B034-5028AE0077F4@ece.cmu.edu> Message-ID: <17680.15931.272562.219959@perdition.linnaean.org> > If practicalities disagree with the theory, then something's wrong > with the theory. In this case, the theory is that we can know every Investigating the converse is also useful -- if you can't practically do what the theory says you can, what are you doing that creates the practical obstacles? "doctor, it hurts when I do this..." > When it comes down to it, your thesis relies on the answers to: > (a) do you know all the levels at which any possible action can > modify the system? > (b) can you reliably record *and later restore* the state at *all* of > those levels? (keeping in mind that this may require actions to be > performed in a particular order, so simply thwacking the eeprom after > doing your disk restore might not completely restore the state if the > eeprom controls something that can affect the restore....) There are more questions. Note that one of the problems a user had was deleting a CNAME and discovering that his journal produced different results depending on the existence of the CNAME. > You can do this with full machine virtualization, and perhaps someday > that will be a best practice. Otherwise, unless you've carefully > inspected and dissected *everything* that touches your system, it's > not clear to me that you can say yes to both of the above questions. In the particular cadence example, you don't even need full machine virtualization to see what's being done -- the program's only means of interacting with the outside world is through the syscall interface, which can be instrumented. Concerned that it's touching the eeprom? You can prove that it isn't directly doing so by showing that it never uses the interface to do so. If you're concerned that it's doing it through an intermediary, the communication with the intermediary will show up, and you should perform the same operation on the suspect intermediary. From wes@umich.edu Tue Sep 19 21:50:38 2006 From: wes@umich.edu (Wesley Craig) Date: Tue, 19 Sep 2006 16:50:38 -0400 Subject: [Infrastructures] state machines In-Reply-To: <4A7F651A-1C82-4BE0-B034-5028AE0077F4@ece.cmu.edu> References: <2aa4b130607310657g67ce2472ke609f3b995531c74@mail.gmail.com> <20060813050737.GA1618@cryptio.net> <17630.54935.362858.867900@perdition.linnaean.org> <20060918230349.GC29633@terraluna.org> <78A6BEFD-D26F-4BED-8257-5CB44CB100FF@umich.edu> <1C5934EE-7AA1-4BC9-9CE1-C0FF0394F221@umich.edu> <7BDB6B4D-D3E0-48E0-AE84-6557E6C77F43@ece.cmu.edu> <56842C9F-A4E0-4B63-9EA8-CD26B9C94C8F@umich.edu> <4A7F651A-1C82-4BE0-B034-5028AE0077F4@ece.cmu.edu> Message-ID: On 19 Sep 2006, at 12:14, Brandon S. Allbery KF8NH wrote: > (b) can you reliably record *and later restore* the state at *all* > of those levels? So, you're arguing that backup and restore don't work? Is that because of the quantum effects you mention? It's like the entire history of computing is wrong. How do you practically deploy a few hundred machines, given that the theory more or less says that it's impossible? :wes From infrastructures@trout.me.uk Tue Sep 19 22:36:03 2006 From: infrastructures@trout.me.uk (Matt S Trout) Date: Tue, 19 Sep 2006 22:36:03 +0100 Subject: [Infrastructures] state machines In-Reply-To: <78A6BEFD-D26F-4BED-8257-5CB44CB100FF@umich.edu> References: <2aa4b130607310657g67ce2472ke609f3b995531c74@mail.gmail.com> <20060813050737.GA1618@cryptio.net> <17630.54935.362858.867900@perdition.linnaean.org> <20060918230349.GC29633@terraluna.org> <78A6BEFD-D26F-4BED-8257-5CB44CB100FF@umich.edu> Message-ID: <451062C3.8010908@trout.me.uk> Wesley Craig wrote: > Stated so generally, I think I can come up with counter examples. For > instance, if you make a full disk image before pushing out a change, you > can in fact "undo" the change, by restoring the backup. This is precisely what Steve proposes as the *one* way that you can do a reliable undo. This does, of course, assume that only the contents of the disk ever varies between machines of the same type, and that everything else (BIOS etc.) is configured identically before deployment and never changed afterwards. -- Matt S Trout Offering custom development, consultancy and support Technical Director contracts for Catalyst, DBIx::Class and BAST. Contact Shadowcat Systems Ltd. mst (at) shadowcatsystems.co.uk for more information + Help us build a better perl ORM: http://dbix-class.shadowcatsystems.co.uk/ + From wes@umich.edu Tue Sep 19 22:52:13 2006 From: wes@umich.edu (Wesley Craig) Date: Tue, 19 Sep 2006 17:52:13 -0400 Subject: [Infrastructures] state machines In-Reply-To: <451062C3.8010908@trout.me.uk> References: <2aa4b130607310657g67ce2472ke609f3b995531c74@mail.gmail.com> <20060813050737.GA1618@cryptio.net> <17630.54935.362858.867900@perdition.linnaean.org> <20060918230349.GC29633@terraluna.org> <78A6BEFD-D26F-4BED-8257-5CB44CB100FF@umich.edu> <451062C3.8010908@trout.me.uk> Message-ID: <0899D29F-9A81-4392-B302-8A0E94981A4B@umich.edu> On 19 Sep 2006, at 17:36, Matt S Trout wrote: > This is precisely what Steve proposes as the *one* way that you can > do a reliable undo. And if you believe that (speaking practically, I certainly do), there's a lot more you can do. If you're able to efficiently snapshot systems, capture changes, etc, then all (or maybe only most) the rest is not necessary. Take, for example, the idea of installing Cadence. If you knew the state of the system before Cadence was installed, you could capture the changes that installing Cadence made. > This does, of course, assume that only the contents of the disk > ever varies between machines of the same type, and that everything > else (BIOS etc.) is configured identically before deployment and > never changed afterwards. Sure. In fact, "machines of the same type" is often roulette. Those sorts of problems make any solution less that 100% reliable. I know I've certainly swapped out a machine that was "of the same type" as a failed machine, only to find the small underlying differences impacted the services the machine was meant to provide. :wes From treed@ultraviolet.org Tue Sep 19 23:31:15 2006 From: treed@ultraviolet.org (Tracy R Reed) Date: Tue, 19 Sep 2006 15:31:15 -0700 Subject: [Infrastructures] state machines In-Reply-To: References: <2aa4b130607310657g67ce2472ke609f3b995531c74@mail.gmail.com> <20060813050737.GA1618@cryptio.net> <17630.54935.362858.867900@perdition.linnaean.org> <20060918230349.GC29633@terraluna.org> <78A6BEFD-D26F-4BED-8257-5CB44CB100FF@umich.edu> <1C5934EE-7AA1-4BC9-9CE1-C0FF0394F221@umich.edu> <7BDB6B4D-D3E0-48E0-AE84-6557E6C77F43@ece.cmu.edu> <56842C9F-A4E0-4B63-9EA8-CD26B9C94C8F@umich.edu> <4A7F651A-1C82-4BE0-B034-5028AE0077F4@ece.cmu.edu> Message-ID: <45106FB3.1030902@ultraviolet.org> I have been following this thread (and this list) for a few months trying to glean wisdom from the gurus and have managed to keep quiet thus far but this message tickled me. Wesley Craig wrote: > On 19 Sep 2006, at 12:14, Brandon S. Allbery KF8NH wrote: >> (b) can you reliably record *and later restore* the state at *all* >> of those levels? > > So, you're arguing that backup and restore don't work? Is that > because of the quantum effects you mention? I'm not sure if you are joking here or not. If a photon hits your computer the quantum state of the computer is changed. I, for one, don't care about that as far as my software goes. Unless it gets to the point where my computer melts. But then I have problems other than software. > It's like the entire history of computing is wrong. How do you > practically deploy a few hundred machines, given that the theory more > or less says that it's impossible? Theory may say it is impossible to do it *perfectly* but in practice all most of us need is "good enough" and that is the only thing that allows any of us to actually get any real work done. -- Tracy R Reed http://ultraviolet.org A: Because we read from top to bottom, left to right Q: Why should I start my reply below the quoted text From allbery@ece.cmu.edu Tue Sep 19 23:47:43 2006 From: allbery@ece.cmu.edu (Brandon S. Allbery KF8NH) Date: Tue, 19 Sep 2006 18:47:43 -0400 Subject: [Infrastructures] state machines In-Reply-To: References: <2aa4b130607310657g67ce2472ke609f3b995531c74@mail.gmail.com> <20060813050737.GA1618@cryptio.net> <17630.54935.362858.867900@perdition.linnaean.org> <20060918230349.GC29633@terraluna.org> <78A6BEFD-D26F-4BED-8257-5CB44CB100FF@umich.edu> <1C5934EE-7AA1-4BC9-9CE1-C0FF0394F221@umich.edu> <7BDB6B4D-D3E0-48E0-AE84-6557E6C77F43@ece.cmu.edu> <56842C9F-A4E0-4B63-9EA8-CD26B9C94C8F@umich.edu> <4A7F651A-1C82-4BE0-B034-5028AE0077F4@ece.cmu.edu> Message-ID: <9B130E64-50D8-46C4-A390-A39B5B537670@ece.cmu.edu> On Sep 19, 2006, at 16:50 , Wesley Craig wrote: > On 19 Sep 2006, at 12:14, Brandon S. Allbery KF8NH wrote: >> (b) can you reliably record *and later restore* the state at *all* >> of those levels? > > So, you're arguing that backup and restore don't work? Is that > because of the quantum effects you mention? I'm saying in (hopefully) rare cases it could matter. But even in the general case your machine's state can involve more than just the disk, and unless you're doing extra work the backups only save the state of the disk. -- brandon s. allbery [linux,solaris,freebsd,perl] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH From wes@umich.edu Wed Sep 20 03:42:45 2006 From: wes@umich.edu (Wesley Craig) Date: Tue, 19 Sep 2006 22:42:45 -0400 Subject: [Infrastructures] state machines In-Reply-To: <45106FB3.1030902@ultraviolet.org> References: <2aa4b130607310657g67ce2472ke609f3b995531c74@mail.gmail.com> <20060813050737.GA1618@cryptio.net> <17630.54935.362858.867900@perdition.linnaean.org> <20060918230349.GC29633@terraluna.org> <78A6BEFD-D26F-4BED-8257-5CB44CB100FF@umich.edu> <1C5934EE-7AA1-4BC9-9CE1-C0FF0394F221@umich.edu> <7BDB6B4D-D3E0-48E0-AE84-6557E6C77F43@ece.cmu.edu> <56842C9F-A4E0-4B63-9EA8-CD26B9C94C8F@umich.edu> <4A7F651A-1C82-4BE0-B034-5028AE0077F4@ece.cmu.edu> <45106FB3.1030902@ultraviolet.org> Message-ID: <52D041FA-70B1-4C2F-8186-025A22425B73@umich.edu> On 19 Sep 2006, at 18:31, Tracy R Reed wrote: > Wesley Craig wrote: >> On 19 Sep 2006, at 12:14, Brandon S. Allbery KF8NH wrote: >>> (b) can you reliably record *and later restore* the state at >>> *all* of those levels? >> So, you're arguing that backup and restore don't work? Is that >> because of the quantum effects you mention? > > I'm not sure if you are joking here or not. If a photon hits your > computer the quantum state of the computer is changed. I, for one, > don't care about that as far as my software goes. Unless it gets to > the point where my computer melts. But then I have problems other > than software. I guess you can say I'm joking. I'm pretty sure that backup & restore work in most cases. Brandon seemed to be saying that backup & restore don't work. I'm perfectly willing to discuss the edge cases where backup & restore are imperfect, just so long as we can acknowledge that we're doing backups *so that we can restore*. And that we're going through the pain of backing up because restores *do work*, where "work" is defined as accomplishing something useful in the realm of managing systems. > Theory may say it is impossible to do it *perfectly* but in > practice all most of us need is "good enough" and that is the only > thing that allows any of us to actually get any real work done. Amen. :wes From allbery@ece.cmu.edu Wed Sep 20 03:46:00 2006 From: allbery@ece.cmu.edu (Brandon S. Allbery KF8NH) Date: Tue, 19 Sep 2006 22:46:00 -0400 Subject: [Infrastructures] state machines In-Reply-To: <52D041FA-70B1-4C2F-8186-025A22425B73@umich.edu> References: <2aa4b130607310657g67ce2472ke609f3b995531c74@mail.gmail.com> <20060813050737.GA1618@cryptio.net> <17630.54935.362858.867900@perdition.linnaean.org> <20060918230349.GC29633@terraluna.org> <78A6BEFD-D26F-4BED-8257-5CB44CB100FF@umich.edu> <1C5934EE-7AA1-4BC9-9CE1-C0FF0394F221@umich.edu> <7BDB6B4D-D3E0-48E0-AE84-6557E6C77F43@ece.cmu.edu> <56842C9F-A4E0-4B63-9EA8-CD26B9C94C8F@umich.edu> <4A7F651A-1C82-4BE0-B034-5028AE0077F4@ece.cmu.edu> <45106FB3.1030902@ultraviolet.org> <52D041FA-70B1-4C2F-8186-025A22425B73@umich.edu> Message-ID: On Sep 19, 2006, at 22:42 , Wesley Craig wrote: > On 19 Sep 2006, at 18:31, Tracy R Reed wrote: >> Wesley Craig wrote: >>> On 19 Sep 2006, at 12:14, Brandon S. Allbery KF8NH wrote: >>>> (b) can you reliably record *and later restore* the state at >>>> *all* of those levels? >>> So, you're arguing that backup and restore don't work? Is that >>> because of the quantum effects you mention? >> >> I'm not sure if you are joking here or not. If a photon hits your >> computer the quantum state of the computer is changed. I, for one, >> don't care about that as far as my software goes. Unless it gets >> to the point where my computer melts. But then I have problems >> other than software. > > I guess you can say I'm joking. I'm pretty sure that backup & > restore work in most cases. Brandon seemed to be saying that > backup & restore don't work. You are misunderstanding; they work for what they do, but you'd best be aware of the parts of your infrastructure that aren't represented by local disk. The recent mention of dependency on CNAMEs was a better example of what I was getting at. -- brandon s. allbery [linux,solaris,freebsd,perl] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH From wes@umich.edu Wed Sep 20 03:51:43 2006 From: wes@umich.edu (Wesley Craig) Date: Tue, 19 Sep 2006 22:51:43 -0400 Subject: [Infrastructures] state machines In-Reply-To: References: <2aa4b130607310657g67ce2472ke609f3b995531c74@mail.gmail.com> <20060813050737.GA1618@cryptio.net> <17630.54935.362858.867900@perdition.linnaean.org> <20060918230349.GC29633@terraluna.org> <78A6BEFD-D26F-4BED-8257-5CB44CB100FF@umich.edu> <1C5934EE-7AA1-4BC9-9CE1-C0FF0394F221@umich.edu> <7BDB6B4D-D3E0-48E0-AE84-6557E6C77F43@ece.cmu.edu> <56842C9F-A4E0-4B63-9EA8-CD26B9C94C8F@umich.edu> <4A7F651A-1C82-4BE0-B034-5028AE0077F4@ece.cmu.edu> <45106FB3.1030902@ultraviolet.org> <52D041FA-70B1-4C2F-8186-025A22425B73@umich.edu> Message-ID: <0255A611-22FE-4C8E-A1B8-F5AD4741F57D@umich.edu> On 19 Sep 2006, at 22:46, Brandon S. Allbery KF8NH wrote: > You are misunderstanding; they work for what they do, but you'd > best be aware of the parts of your infrastructure that aren't > represented by local disk. The recent mention of dependency on > CNAMEs was a better example of what I was getting at. Oh sure, I agree with that. But let me ask you: What is more likely to have the CNAME dependency problem? A recent backup restored, or an old system image with months or years worth of changes applied? Perhaps you see what I'm getting at. :wes From Daniel Hagerty Wed Sep 20 04:52:13 2006 From: Daniel Hagerty (Daniel Hagerty) Date: Tue, 19 Sep 2006 23:52:13 -0400 Subject: [Infrastructures] state machines In-Reply-To: <0255A611-22FE-4C8E-A1B8-F5AD4741F57D@umich.edu> References: <2aa4b130607310657g67ce2472ke609f3b995531c74@mail.gmail.com> <20060813050737.GA1618@cryptio.net> <17630.54935.362858.867900@perdition.linnaean.org> <20060918230349.GC29633@terraluna.org> <78A6BEFD-D26F-4BED-8257-5CB44CB100FF@umich.edu> <1C5934EE-7AA1-4BC9-9CE1-C0FF0394F221@umich.edu> <7BDB6B4D-D3E0-48E0-AE84-6557E6C77F43@ece.cmu.edu> <56842C9F-A4E0-4B63-9EA8-CD26B9C94C8F@umich.edu> <4A7F651A-1C82-4BE0-B034-5028AE0077F4@ece.cmu.edu> <45106FB3.1030902@ultraviolet.org> <52D041FA-70B1-4C2F-8186-025A22425B73@umich.edu> <0255A611-22FE-4C8E-A1B8-F5AD4741F57D@umich.edu> Message-ID: <17680.47853.634335.262628@perdition.linnaean.org> > Oh sure, I agree with that. But let me ask you: What is more likely > to have the CNAME dependency problem? A recent backup restored, or > an old system image with months or years worth of changes applied? > Perhaps you see what I'm getting at. You're presumably suggesting that the backup (since it's just data at this level of abstraction) is more robust than a series of hand crafted imperative statemnts that you execute in proper order. The two both have their places. A backup is a large relatively opaque blob; code that supposedly reproduces the backup is introspectable in a way the backup is not. One being better than the other is dependant on the context of use. From wes@umich.edu Wed Sep 20 05:11:00 2006 From: wes@umich.edu (Wesley Craig) Date: Wed, 20 Sep 2006 00:11:00 -0400 Subject: [Infrastructures] state machines In-Reply-To: <17680.47853.634335.262628@perdition.linnaean.org> References: <2aa4b130607310657g67ce2472ke609f3b995531c74@mail.gmail.com> <20060813050737.GA1618@cryptio.net> <17630.54935.362858.867900@perdition.linnaean.org> <20060918230349.GC29633@terraluna.org> <78A6BEFD-D26F-4BED-8257-5CB44CB100FF@umich.edu> <1C5934EE-7AA1-4BC9-9CE1-C0FF0394F221@umich.edu> <7BDB6B4D-D3E0-48E0-AE84-6557E6C77F43@ece.cmu.edu> <56842C9F-A4E0-4B63-9EA8-CD26B9C94C8F@umich.edu> <4A7F651A-1C82-4BE0-B034-5028AE0077F4@ece.cmu.edu> <45106FB3.1030902@ultraviolet.org> <52D041FA-70B1-4C2F-8186-025A22425B73@umich.edu> <0255A611-22FE-4C8E-A1B8-F5AD4741F57D@umich.edu> <17680.47853.634335.262628@perdition.linnaean.org> Message-ID: On 19 Sep 2006, at 23:52, Daniel Hagerty wrote: >> Oh sure, I agree with that. But let me ask you: What is more likely >> to have the CNAME dependency problem? A recent backup restored, or >> an old system image with months or years worth of changes applied? >> Perhaps you see what I'm getting at. > > You're presumably suggesting that the backup (since it's just data > at this level of abstraction) is more robust than a series of hand > crafted imperative statemnts that you execute in proper order. Without getting into "which is better," which is more likely to have the CNAME dependency problem? > The two both have their places. A backup is a large relatively > opaque blob; code that supposedly reproduces the backup is > introspectable in a way the backup is not. One being better than the > other is dependant on the context of use. On the one hand, it's hard to disagree that different solutions are useful in different situations. Your opacity statement is just hand waving, tho. Discussing which specific situations are more amenable to which specific solutions would be useful. :wes From Daniel Hagerty Wed Sep 20 07:32:53 2006 From: Daniel Hagerty (Daniel Hagerty) Date: Wed, 20 Sep 2006 02:32:53 -0400 Subject: [Infrastructures] state machines In-Reply-To: References: <2aa4b130607310657g67ce2472ke609f3b995531c74@mail.gmail.com> <20060813050737.GA1618@cryptio.net> <17630.54935.362858.867900@perdition.linnaean.org> <20060918230349.GC29633@terraluna.org> <78A6BEFD-D26F-4BED-8257-5CB44CB100FF@umich.edu> <1C5934EE-7AA1-4BC9-9CE1-C0FF0394F221@umich.edu> <7BDB6B4D-D3E0-48E0-AE84-6557E6C77F43@ece.cmu.edu> <56842C9F-A4E0-4B63-9EA8-CD26B9C94C8F@umich.edu> <4A7F651A-1C82-4BE0-B034-5028AE0077F4@ece.cmu.edu> <45106FB3.1030902@ultraviolet.org> <52D041FA-70B1-4C2F-8186-025A22425B73@umich.edu> <0255A611-22FE-4C8E-A1B8-F5AD4741F57D@umich.edu> <17680.47853.634335.262628@perdition.linnaean.org> Message-ID: <17680.57493.846253.829063@perdition.linnaean.org> > Without getting into "which is better," which is more likely to have > the CNAME dependency problem? I believe we've covered this already. Either can demonstrate the problem in the end result. Any path involving execution has an additional peril of demonstrating this flavor of problem during execution. The math that demonstrates the relatively complexity of the two is trivial, should we need to see it. In truth, even the straight up restore image has an execution phase subject to all the usual perils, but I think we can take it as a given that it works in practice. > On the one hand, it's hard to disagree that different solutions are > useful in different situations. Your opacity statement is just hand > waving, tho. Discussing which specific situations are more amenable > to which specific solutions would be useful. That is not a handwave, I just write coming from a highly abstracted thought process. An image backup is a representation of "what" where the "how" that produced the what is lost. This is both its strength, and its weakness. By contrast, an execution method produces that "what" from the "how", leaving both to be inspected, debugged, etc. Do you disagree that the removal of potentially essential information increases an image's opacity while also making it simpler? If you want specific situations to show how the the difference between "what" and "how" matter, you should have enough on hand to generate them, rather than relying on me writing poorly written examples that lend themselves to being misconstrued. From brendan@cs.uchicago.edu Wed Sep 20 14:39:14 2006 From: brendan@cs.uchicago.edu (Brendan Strejcek) Date: Wed, 20 Sep 2006 09:39:14 -0400 Subject: [Infrastructures] state machines In-Reply-To: <17680.47853.634335.262628@perdition.linnaean.org> References: <2aa4b130607310657g67ce2472ke609f3b995531c74@mail.gmail.com> <7BDB6B4D-D3E0-48E0-AE84-6557E6C77F43@ece.cmu.edu> <56842C9F-A4E0-4B63-9EA8-CD26B9C94C8F@umich.edu> <4A7F651A-1C82-4BE0-B034-5028AE0077F4@ece.cmu.edu> <45106FB3.1030902@ultraviolet.org> <52D041FA-70B1-4C2F-8186-025A22425B73@umich.edu> <0255A611-22FE-4C8E-A1B8-F5AD4741F57D@umich.edu> <17680.47853.634335.262628@perdition.linnaean.org> Message-ID: <816b88240609200639ve813f77p4922d04124271ec0@mail.gmail.com> On 9/19/06, Daniel Hagerty wrote: > > Oh sure, I agree with that. But let me ask you: What is more likely > > to have the CNAME dependency problem? A recent backup restored, or > > an old system image with months or years worth of changes applied? > > Perhaps you see what I'm getting at. > > You're presumably suggesting that the backup (since it's just data > at this level of abstraction) is more robust than a series of hand > crafted imperative statemnts that you execute in proper order. > > The two both have their places. A backup is a large relatively > opaque blob; code that supposedly reproduces the backup is > introspectable in a way the backup is not. One being better than the > other is dependant on the context of use. Actually, I think I can make this a little more concrete with another example, which incorporates the backup scenario and brings the discussion back to the original presentation of the state machine model of management. You have a deterministic backup (data and code that can reinstantiate it) which behaves exactly as expected. However, between the time the backup was taken and the time restored, some UID mappings were changed on an external NIS or LDAP server, so files no longer have the correct ownership. Note that this is not a problem with NIS or LDAP: if you store the usernames instead, you could still have the same problem, since the canonical data, by definition, will always be that given by the directory server. Both the previous DNS alias example and the above UID example have a similar nature: data is stored in an external directory which is critical to expected functionality. The problem is a hard one because it incorporates aspects of federation (you may not control the directory servers; thus, the directory servers may be unreliable and/or malicious). The essential problem is defining the boundaries of the system to be managed. If you can't do that, you can't construct the state machine digraph. The state of the network is not an edge case. Best, Brendan -- http://praksys.blogspot.com http://people.cs.uchicago.edu/~brendan/ From Daniel Hagerty Wed Sep 20 19:50:27 2006 From: Daniel Hagerty (Daniel Hagerty) Date: Wed, 20 Sep 2006 14:50:27 -0400 Subject: [Infrastructures] state machines In-Reply-To: <816b88240609200639ve813f77p4922d04124271ec0@mail.gmail.com> References: <2aa4b130607310657g67ce2472ke609f3b995531c74@mail.gmail.com> <7BDB6B4D-D3E0-48E0-AE84-6557E6C77F43@ece.cmu.edu> <56842C9F-A4E0-4B63-9EA8-CD26B9C94C8F@umich.edu> <4A7F651A-1C82-4BE0-B034-5028AE0077F4@ece.cmu.edu> <45106FB3.1030902@ultraviolet.org> <52D041FA-70B1-4C2F-8186-025A22425B73@umich.edu> <0255A611-22FE-4C8E-A1B8-F5AD4741F57D@umich.edu> <17680.47853.634335.262628@perdition.linnaean.org> <816b88240609200639ve813f77p4922d04124271ec0@mail.gmail.com> Message-ID: <17681.36211.642948.456804@perdition.linnaean.org> > Both the previous DNS alias example and the above UID example have a > similar nature: data is stored in an external directory which is > critical to expected functionality. The problem is a hard one because > it incorporates aspects of federation (you may not control the > directory servers; thus, the directory servers may be unreliable > and/or malicious). > > The essential problem is defining the boundaries of the system to be > managed. If you can't do that, you can't construct the state machine > digraph. Well, not strictly true, but harder. The above examples, and the network in general falls under the following math problem: eval(expression, context) yields a value. If you change the context (a CNAME, the LDAP server, etc, etc), it's quite possible that you're changing the value you produce. If you can capture the way that expression relies on its context, you can possibly seperate its dependance on the context so that simple, obvious changes will cause you to generate the same value given different contexts. An ex-employer does something effectively like this for managing the development/testing/production installation of their in house product. That's exactly the sort of situation where some amount of the context (e.g. what's the name of the front end webserver?) is an immutable given that changes between test and production, but you don't want to get mired in these immaterial differences. From wes@umich.edu Wed Sep 20 20:01:32 2006 From: wes@umich.edu (Wesley Craig) Date: Wed, 20 Sep 2006 15:01:32 -0400 Subject: [Infrastructures] state machines In-Reply-To: <816b88240609200639ve813f77p4922d04124271ec0@mail.gmail.com> References: <2aa4b130607310657g67ce2472ke609f3b995531c74@mail.gmail.com> <7BDB6B4D-D3E0-48E0-AE84-6557E6C77F43@ece.cmu.edu> <56842C9F-A4E0-4B63-9EA8-CD26B9C94C8F@umich.edu> <4A7F651A-1C82-4BE0-B034-5028AE0077F4@ece.cmu.edu> <45106FB3.1030902@ultraviolet.org> <52D041FA-70B1-4C2F-8186-025A22425B73@umich.edu> <0255A611-22FE-4C8E-A1B8-F5AD4741F57D@umich.edu> <17680.47853.634335.262628@perdition.linnaean.org> <816b88240609200639ve813f77p4922d04124271ec0@mail.gmail.com> Message-ID: <6420F52E-AA40-4FC0-B361-3F8B845E8051@umich.edu> On 20 Sep 2006, at 09:39, Brendan Strejcek wrote: > You have a deterministic backup (data and code that can reinstantiate > it) which behaves exactly as expected. However, between the time the > backup was taken and the time restored, some UID mappings were changed > on an external NIS or LDAP server, so files no longer have the correct > ownership. This is a great example problem. Thank you for a positive contribution to the discussion. If the backed up system had been running when the UID mapping changed, how would that have been handled? > The essential problem is defining the boundaries of the system to be > managed. Indubitably. > The state of the network is not an edge case. I guess you're revealing where you place the boundary. :) :wes From wes@umich.edu Wed Sep 20 20:13:39 2006 From: wes@umich.edu (Wesley Craig) Date: Wed, 20 Sep 2006 15:13:39 -0400 Subject: [Infrastructures] state machines In-Reply-To: <17680.57493.846253.829063@perdition.linnaean.org> References: <2aa4b130607310657g67ce2472ke609f3b995531c74@mail.gmail.com> <20060813050737.GA1618@cryptio.net> <17630.54935.362858.867900@perdition.linnaean.org> <20060918230349.GC29633@terraluna.org> <78A6BEFD-D26F-4BED-8257-5CB44CB100FF@umich.edu> <1C5934EE-7AA1-4BC9-9CE1-C0FF0394F221@umich.edu> <7BDB6B4D-D3E0-48E0-AE84-6557E6C77F43@ece.cmu.edu> <56842C9F-A4E0-4B63-9EA8-CD26B9C94C8F@umich.edu> <4A7F651A-1C82-4BE0-B034-5028AE0077F4@ece.cmu.edu> <45106FB3.1030902@ultraviolet.org> <52D041FA-70B1-4C2F-8186-025A22425B73@umich.edu> <0255A611-22FE-4C8E-A1B8-F5AD4741F57D@umich.edu> <17680.47853.634335.262628@perdition.linnaean.org> <17680.57493.846253.829063@perdition.linnaean.org> Message-ID: <9E55DE19-1F26-43E4-BE96-5113C1531E30@umich.edu> On 20 Sep 2006, at 02:32, Daniel Hagerty wrote: >> Without getting into "which is better," which is more likely to have >> the CNAME dependency problem? > > I believe we've covered this already. Either can demonstrate the > problem in the end result. > > Any path involving execution has an additional peril of > demonstrating this flavor of problem during execution. The math that > demonstrates the relatively complexity of the two is trivial, should > we need to see it. > > In truth, even the straight up restore image has an execution > phase subject to all the usual perils, but I think we can take it as a > given that it works in practice. I've read the three paragraphs about three times now. I get "running the log is more likely to produce the CNAME problem." Please do correct me if I've read that wrong. > That is not a handwave, I just write coming from a highly > abstracted thought process. > > An image backup is a representation of "what" where the "how" that > produced the what is lost. This is both its strength, and its > weakness. By contrast, an execution method produces that "what" from > the "how", leaving both to be inspected, debugged, etc. The execution method starts with "what-1", executes "how" to produce "what-2". Without being able to more or less fully inspect "what-1", you're not going to have too much idea about "what-2", despite the relative clarity of "how". What if "how" were a simple patch to "what-1"? I will grant you that having "how" as a clear delta between "what-1" and "what-2" can be handy for deeper analysis, particularly if it's reversible. > Do you disagree that the removal of potentially essential > information increases an image's opacity while also making it simpler? If I were removing essential information, I would probably be increasing an image's opacity. But it's a presumption that essential information must be destroyed. :wes From allbery@ece.cmu.edu Wed Sep 20 20:37:32 2006 From: allbery@ece.cmu.edu (Brandon S. Allbery KF8NH) Date: Wed, 20 Sep 2006 15:37:32 -0400 Subject: [Infrastructures] state machines In-Reply-To: <9E55DE19-1F26-43E4-BE96-5113C1531E30@umich.edu> References: <2aa4b130607310657g67ce2472ke609f3b995531c74@mail.gmail.com> <20060813050737.GA1618@cryptio.net> <17630.54935.362858.867900@perdition.linnaean.org> <20060918230349.GC29633@terraluna.org> <78A6BEFD-D26F-4BED-8257-5CB44CB100FF@umich.edu> <1C5934EE-7AA1-4BC9-9CE1-C0FF0394F221@umich.edu> <7BDB6B4D-D3E0-48E0-AE84-6557E6C77F43@ece.cmu.edu> <56842C9F-A4E0-4B63-9EA8-CD26B9C94C8F@umich.edu> <4A7F651A-1C82-4BE0-B034-5028AE0077F4@ece.cmu.edu> <45106FB3.1030902@ultraviolet.org> <52D041FA-70B1-4C2F-8186-025A22425B73@umich.edu> <0255A611-22FE-4C8E-A1B8-F5AD4741F57D@umich.edu> <17680.47853.634335.262628@perdition.linnaean.org> <17680.57493.846253.829063@perdition.linnaean.org> <9E55DE19-1F26-43E4-BE96-5113C1531E30@umich.edu> Message-ID: <823BDA9B-8E78-449A-A331-2A405DAD272D@ece.cmu.edu> On Sep 20, 2006, at 15:13 , Wesley Craig wrote: > I've read the three paragraphs about three times now. I get > "running the log is more likely to produce the CNAME problem." > Please do correct me if I've read that wrong. They're both going to have it if nothing records information about the CNAME dependency. If whatever runs the log is aware of the dependency, it can (at minimum) raise an exception when the dependency is found to not be met. There is no way to do this with a simple backup. (If it's not a simple backup then it's a degenerate case (In a fully integrated system, the dependency would also be recognized by the DNS control subsystem, and would either be satisfied or would raise an exception due to a conflict; and the restore procedure on the restored system itself would wait until the external dependency was met.) -- brandon s. allbery [linux,solaris,freebsd,perl] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH From Daniel Hagerty Wed Sep 20 20:53:42 2006 From: Daniel Hagerty (Daniel Hagerty) Date: Wed, 20 Sep 2006 15:53:42 -0400 Subject: [Infrastructures] state machines In-Reply-To: <6420F52E-AA40-4FC0-B361-3F8B845E8051@umich.edu> References: <2aa4b130607310657g67ce2472ke609f3b995531c74@mail.gmail.com> <7BDB6B4D-D3E0-48E0-AE84-6557E6C77F43@ece.cmu.edu> <56842C9F-A4E0-4B63-9EA8-CD26B9C94C8F@umich.edu> <4A7F651A-1C82-4BE0-B034-5028AE0077F4@ece.cmu.edu> <45106FB3.1030902@ultraviolet.org> <52D041FA-70B1-4C2F-8186-025A22425B73@umich.edu> <0255A611-22FE-4C8E-A1B8-F5AD4741F57D@umich.edu> <17680.47853.634335.262628@perdition.linnaean.org> <816b88240609200639ve813f77p4922d04124271ec0@mail.gmail.com> <6420F52E-AA40-4FC0-B361-3F8B845E8051@umich.edu> Message-ID: <17681.40006.356966.682887@perdition.linnaean.org> > > The state of the network is not an edge case. > > I guess you're revealing where you place the boundary. :) Let's turn the question back at you then. Suppose we have two apache web servers that are front ends to two different systems with "identical" behavior. These servers, by nature, have different names; one is "foo", the other is "bar". Both are hiding behind a NAT system so that no address of either machine will reveal that the correct names for generating redirects are "foo" and "bar". Draw a boundary such that apache still generates correct redirects for http 1.0 without having different configuration files that expressely mention the correct names on "foo" and "bar". From Daniel Hagerty Wed Sep 20 21:03:36 2006 From: Daniel Hagerty (Daniel Hagerty) Date: Wed, 20 Sep 2006 16:03:36 -0400 Subject: [Infrastructures] state machines In-Reply-To: <9E55DE19-1F26-43E4-BE96-5113C1531E30@umich.edu> References: <2aa4b130607310657g67ce2472ke609f3b995531c74@mail.gmail.com> <20060813050737.GA1618@cryptio.net> <17630.54935.362858.867900@perdition.linnaean.org> <20060918230349.GC29633@terraluna.org> <78A6BEFD-D26F-4BED-8257-5CB44CB100FF@umich.edu> <1C5934EE-7AA1-4BC9-9CE1-C0FF0394F221@umich.edu> <7BDB6B4D-D3E0-48E0-AE84-6557E6C77F43@ece.cmu.edu> <56842C9F-A4E0-4B63-9EA8-CD26B9C94C8F@umich.edu> <4A7F651A-1C82-4BE0-B034-5028AE0077F4@ece.cmu.edu> <45106FB3.1030902@ultraviolet.org> <52D041FA-70B1-4C2F-8186-025A22425B73@umich.edu> <0255A611-22FE-4C8E-A1B8-F5AD4741F57D@umich.edu> <17680.47853.634335.262628@perdition.linnaean.org> <17680.57493.846253.829063@perdition.linnaean.org> <9E55DE19-1F26-43E4-BE96-5113C1531E30@umich.edu> Message-ID: <17681.40600.570759.784099@perdition.linnaean.org> > The execution method starts with "what-1", executes "how" to produce > "what-2". Without being able to more or less fully inspect "what-1", > you're not going to have too much idea about "what-2", despite the > relative clarity of "how". What if "how" were a simple patch to > "what-1"? I will grant you that having "how" as a clear delta > between "what-1" and "what-2" can be handy for deeper analysis, > particularly if it's reversible. You're assuming that I'm of the isconf school, and that being in state what-1 is a strict precondition. Really I prefer to avoid that, or at least have the delta function perform enough introspection to recognize when and when it can't do the right thing. The problem is obviously undeciable from far enough out, but in practice, you won't arrive here often. I don't think we need to go down this relatively off topic route further. We're coming from very different backgrounds and enlightening the differences between them is probably better reached through another path. From tubaman@fattuba.com Wed Sep 20 22:51:38 2006 From: tubaman@fattuba.com (Ryan Nowakowski) Date: Wed, 20 Sep 2006 16:51:38 -0500 Subject: [Infrastructures] isconf v4 problem In-Reply-To: <20060916061132.GA29633@terraluna.org> References: <20060911223727.GE27570@fattuba.com> <20060916061132.GA29633@terraluna.org> Message-ID: <20060920215138.GK8538@polishwonder.fattuba.com> --YiEDa0DAkWCtVeE4 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Sep 15, 2006 at 11:11:32PM -0700, Steve Traugott wrote: > I'm thinking of replacing the UDP broadcasts with an ssh-based mesh -- > this would require that people manage ssh keys, authorized_keys, and a > local isconf user in /etc/password on each machine. This is instead > of the TCP mesh I haven't had time to write; in the latter, people > would have had to manage PGP keys and/or HMAC secrets anyway -- in > retrospect, ssh is probably simpler and better understood both > operationally and security-wise. =20 SSH is definitely the way to go. I've been struggling with rolling my own inter-node communication mechanism and I have ended up back at SSH. --YiEDa0DAkWCtVeE4 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFFEbfq6ZA8+1/wXqMRArJRAJwKU+AZzltPc90Kcg2ZCysp/38J/wCfQ4Y3 UUxY3FtL8asu62W8fliZJoU= =75tv -----END PGP SIGNATURE----- --YiEDa0DAkWCtVeE4-- From Daniel Hagerty Wed Sep 20 23:20:57 2006 From: Daniel Hagerty (Daniel Hagerty) Date: Wed, 20 Sep 2006 18:20:57 -0400 Subject: [Infrastructures] isconf v4 problem In-Reply-To: <20060920215138.GK8538@polishwonder.fattuba.com> References: <20060911223727.GE27570@fattuba.com> <20060916061132.GA29633@terraluna.org> <20060920215138.GK8538@polishwonder.fattuba.com> Message-ID: <17681.48841.305016.947640@perdition.linnaean.org> > SSH is definitely the way to go. I've been struggling with rolling my > own inter-node communication mechanism and I have ended up back at SSH. Seconded, at least where I've got more limited experience than general communication. At a previous employer we used rsync over ssh for file movements, with a bunch of glue in the middle to prevent some of the more obvious potential abuses. Worked well. From wes@umich.edu Thu Sep 21 01:37:27 2006 From: wes@umich.edu (Wesley Craig) Date: Wed, 20 Sep 2006 20:37:27 -0400 Subject: [Infrastructures] state machines In-Reply-To: <17681.40006.356966.682887@perdition.linnaean.org> References: <2aa4b130607310657g67ce2472ke609f3b995531c74@mail.gmail.com> <7BDB6B4D-D3E0-48E0-AE84-6557E6C77F43@ece.cmu.edu> <56842C9F-A4E0-4B63-9EA8-CD26B9C94C8F@umich.edu> <4A7F651A-1C82-4BE0-B034-5028AE0077F4@ece.cmu.edu> <45106FB3.1030902@ultraviolet.org> <52D041FA-70B1-4C2F-8186-025A22425B73@umich.edu> <0255A611-22FE-4C8E-A1B8-F5AD4741F57D@umich.edu> <17680.47853.634335.262628@perdition.linnaean.org> <816b88240609200639ve813f77p4922d04124271ec0@mail.gmail.com> <6420F52E-AA40-4FC0-B361-3F8B845E8051@umich.edu> <17681.40006.356966.682887@perdition.linnaean.org> Message-ID: This is a configuration management / infrastructure question, or an apache configuration question? :wes On 20 Sep 2006, at 15:53, Daniel Hagerty wrote: > Suppose we have two apache web servers that are front ends to two > different systems with "identical" behavior. These servers, by > nature, have different names; one is "foo", the other is "bar". Both > are hiding behind a NAT system so that no address of either machine > will reveal that the correct names for generating redirects are "foo" > and "bar". > > Draw a boundary such that apache still generates correct redirects > for http 1.0 without having different configuration files that > expressely mention the correct names on "foo" and "bar". From Daniel Hagerty Thu Sep 21 06:20:52 2006 From: Daniel Hagerty (Daniel Hagerty) Date: Thu, 21 Sep 2006 01:20:52 -0400 Subject: [Infrastructures] state machines In-Reply-To: References: <2aa4b130607310657g67ce2472ke609f3b995531c74@mail.gmail.com> <7BDB6B4D-D3E0-48E0-AE84-6557E6C77F43@ece.cmu.edu> <56842C9F-A4E0-4B63-9EA8-CD26B9C94C8F@umich.edu> <4A7F651A-1C82-4BE0-B034-5028AE0077F4@ece.cmu.edu> <45106FB3.1030902@ultraviolet.org> <52D041FA-70B1-4C2F-8186-025A22425B73@umich.edu> <0255A611-22FE-4C8E-A1B8-F5AD4741F57D@umich.edu> <17680.47853.634335.262628@perdition.linnaean.org> <816b88240609200639ve813f77p4922d04124271ec0@mail.gmail.com> <6420F52E-AA40-4FC0-B361-3F8B845E8051@umich.edu> <17681.40006.356966.682887@perdition.linnaean.org> Message-ID: <17682.8500.612923.735556@perdition.linnaean.org> > From: Wesley Craig > Date: Wed, 20 Sep 2006 20:37:27 -0400 > > This is a configuration management / infrastructure question, or an > apache configuration question? The former of course. The particular example isn't the best, but I had hoped you'd see what I was driving at. The pattern of a boundary that you don't control imposing constraints on you that force you to do "unreasonable" things in configuration is a general one. HTTP 1.1 has the client give the server enough information for generating proper redirects without the adminsitrator configuring the server with it, but it's hardly the first time a protocol, piece of software, etc has demonstrated issues of this sort. From Menno.Willemse@johnguest.co.uk Fri Sep 22 08:28:40 2006 From: Menno.Willemse@johnguest.co.uk (Willemse, Menno) Date: Fri, 22 Sep 2006 08:28:40 +0100 Subject: [infrastructures] state machines Message-ID: Hello World, From: Wesley Craig > On 20 Sep 2006, at 09:39, Brendan Strejcek wrote: > > You have a deterministic backup (data and code that can reinstantiate > > it) which behaves exactly as expected. However, between the time the > > backup was taken and the time restored, some UID mappings were changed > > on an external NIS or LDAP server, so files no longer have the correct > > ownership. > > This is a great example problem. Thank you for a positive > contribution to the discussion. > > If the backed up system had been running when the UID mapping > changed, how would that have been handled? Maybe we should draw an analogy to database technology. An image backup (mksysb or some non-AIX equivalent) can be seen as a checkpoint in DB terminology. A restore will also include the machine state at the time it was run. Isn't it then simply a matter of rolling forward the actions between the time the backup was taken and Now? Of course, that does require that the change in ownership that accompanies the change in the LDAP server is also a logged action. Without the occasional image backup, as far as I can tell, you are stuck replaying the whole history since time began. If you simply restore a backup, you won't have the latest changes. To get the whole thing, you need to do both. Cheers, Menno Internet communications are not secure and therefore John Guest companies do not accept legal responsibility for the contents of this message. Any views or opinions presented are solely those of the author and do not necessarily represent those of John Guest companies. From Daniel Hagerty Fri Sep 22 16:33:24 2006 From: Daniel Hagerty (Daniel Hagerty) Date: Fri, 22 Sep 2006 11:33:24 -0400 Subject: [infrastructures] state machines In-Reply-To: References: Message-ID: <17684.580.587635.264889@perdition.linnaean.org> > Maybe we should draw an analogy to database technology. An image > backup (mksysb or some non-AIX equivalent) can be seen as a > checkpoint in DB terminology. A restore will also include the > machine state at the time it was run. Isn't it then simply a matter > of rolling forward the actions between the time the backup was > taken and Now? Of course, that does require that the change in > ownership that accompanies the change in the LDAP server is also a > logged action. > > Without the occasional image backup, as far as I can tell, you are > stuck replaying the whole history since time began. If you simply > restore a backup, you won't have the latest changes. To get the > whole thing, you need to do both. It depends on what you're really doing. The database analogy holds well if imaging and mutation is what you're doing (it's an apt description of what one does with isconf), but there are other ways of breaking the problem up. Consider /etc/resolv.conf as it appears on a typical system. You don't have to play back every mutation that's ever been made to resolv.conf to reproduce a resolv.conf that's functionally identical to what you had at a given point in time. You can even arrive at textually identical if you want. Suppose for example that I use CVS to store a piece of XML like baz.foobar.com foobar.com 127.0.0.1 perhaps with some additional complexity to reflect that one's DNS system probably isn't global as this implies -- different parts of the system might use different DNS servers, or what have you, and it might be better to reflect this in the data, rather than using multiple files. Check the XML into CVS, write a program that renders a resolv.conf from the XML, version the generator program, and then you can check out the XML+program for the point in time you want to regenerate a resolv.conf for. You can also integerate the same XML into your dhcpd.conf generation, and for configuring windows machines. The concepts of DNS are similar across many contexts, even if its expression is different. This kind of approach has been taken before (at larger scales than the example I cite here), and it has some benefits.