[Infrastructures] email to root?
Adam S. Moskowitz
adamm@menlo.com
Mon, 21 Mar 2005 12:59:46 -0500 (EST)
Vincent McIntyre <Vince.McIntyre@atnf.csiro.au> asked:
> What, if anything, are people doing with mail that traditionally goes
> to the 'root' account on the box?
and John Borwick <borwicjh@wfu.edu> replied:
> You must read (or parse) all this mail. . . .
Yes, BUT . . .
Vincent McIntyre <Vince.McIntyre@atnf.csiro.au> again:
> Is it our doom to die of boredom from reading endless near-identical
> emails from cron jobs? Say it ain't so!
I think a better answer is to reduce the amount of mail being delivered.
Why are routine cron jobs sending email if the job works as expected?
What benefit does this provide?
I have implemented schemes something like this: All routine cron jobs
dump their output in a file in a directory somewhere; the first line of
the file contains an easily-parsed status line indicating whether the
job worked as expected, worked with minor problems, failed, etc. A
single cron job runs after the expected completion time of all other
cron jobs, parses the various crontab files just enough to know how many
jobs should have run, looks for and parses the output files, and sends a
*SINGLE* message (per machine if there aren't very many, or per cluster
if there are lots of identical nodes) with a subject line something like
this:
Subject: cron jobs: # ok, # not ok, # missing
The body of the message contains some additional information, maybe even
the full output file from each job that failed (or that could be sent as
a separate message by the "aggregator" job). The mail program is
configured to make it obvious which node sent any given message ("From:
root @ nodeN").
When you read your mail in the morning, you sort by sender and then scan
for any messages that are worth reading.
I could have used a scheme so that no messages would be sent if
everything worked but I like this better in that it lets me see that
things are running as expected. If you cared you could post-process your
inbox and send yourself a message warning you if you didn't get the
expected number of summary messages on any given day.
So yes, you still need to read every message (or at least look at the
subject line) -- but when this is down to ~10 messages a day no matter
how many machines you have, well, that's really not a big deal.
No, I can't give you the code to do this: It was highly-customized for
each place I did it, and I didn't even think to save it when I left --
but it didn't take me moew than a day or two to knock out the frame-
work and then just the odd tweak now and then as outputs changed, etc.
(Yes, I've more-or-less described an asynchronous, email-based version
of Big Brother. :-)
AdamM