[Date Prev]   [Date Next] [Thread Prev]   [Thread Next] [Date Index]   [Thread Index]

 

     nocol-users mozbot talks to nocol

Well, mozbot listens to nocol anyway. ;)

Good timing on releasing the SQL status database, JUST as I finish coding,
d'oh!

At any rate, my company hangs out in irc alot, so that people dont have to
yell from office to office, and when people are at remote sites or
telecommuting, we can maintain conversations without having people
interupting eachother with phone calls. We even have some savvy customers
trained to bother us in a client channel too. ;) Of course we have our own irc
servers.

We sometimes dont notice that some nocol state is changing before we get a
huge number of pages. Even so, our paging system (the telco) can be
unreliable. I want to ensure we do NOT miss any alert statuses in nocol. And
in fact, I want a subset of nocol states to be flagged in some way without
resorting to the cumbersome and intrusive pager system.  If someone is
around on the irc, they can go fix it, when its in error or warning state,
because its not worth paging someone who's out visiting relatives on the
weekend, though it should eventually be fixed. Email is also unreliable in
terms of WHEN people read it, as well as their powers of ignoring important
email. ;)

So for us, IRC seems like a good place to give alerts on things that
we consider not ever worthy of a critical flag. (A good example
is diskspace. We consider 98-99% to be critical on most our systems
because they have 20-50 gigs on most of them now. Having 'only' 1 gig
free is not an emergency. So its not worth it to page someone because
disk space is in error state. But its worth it to indicate it in irc.

I am still tweaking it a bit, and will be done in a day or two, 
and will be handing out urls then.

Just wondering what else people want added to this while Im at it.

Right now I have this coded and working:

- reporting changes to any state (from I to W/E/C or back to I)

- allowing user to request output of any state to channel or back in private
   /msg (I do not allow a full INFO report back in channels)

I am going to tweak this a bit to allow a simple limit - even I can see
now how much state fluctuates - I NEVER noticed this before. Its kinda
interesting - but its alot of traffic in some ways, with every warning
popping up and disappearing another 2-3 minutes later. I can make a limit
so that only Errors or full Critical status are reported.

I am trying to get the exception stuff working a bit better now, to allow
the bot to report on and insert exceptions much as the web interface does.
However, the web interface exceptions do NOT seem to be sticking for me
(on subsequent reload they're gone). I think there's a permission problem
writing to disk, but the exceptions file is owned by the same user that
owns the apache process. Strange.

I am also going to allow the user to request a report of all states (below a
certain level) from a specific machine, or all states (below a certain
level) of a trait ('varname') from all machines. Or one specific machine's
state for a certain trait.

The one last thing I think we need is to allow a nag config to be setup so
that a periodic nagging over an existing state can be reiterated again
(currently the bot only volonteers info on state changes). I can also
allow (a) target user(s) to be nagged specifically about various problems.

A variation of that is allowing different levels of reporting for different
targets: changes to/from "critical" only to the channel, and error
to all the admins via private /msg, and 'warning' to the jr admins only
via /msg. Hmm this could come in VERY handy. (/me grabs his whip with
the jr admin's name on it.)

Mozbot is from the mozilla project, and I've neutered a bunch of its
functions (the tinderbox compilation engine reporting for eg) since they're
specific to the Mozilla groups' desires. However, it maintains a few
interesting things such as reporting slashdot and freshmeat headlines
as they appear, et al. (These can be removed easily as well.)

The next 'version' of this may see this bot cleaned up MASSIVELY, because
the original code was a travesty. :) As well, I will be interfacing with the
new SQL database to get a report of state once I get that working.

Here's a sample of my current output:

<m:#baz>  hadrian.velocet  NetOErr=7 chg:I from W (since 12/21 16:04)
<m:#baz>  water.planeteer  NetColl=0 chg:I from W (since 12/21 16:14)

(oh the output string can be configured as well)

/kc
-- 
Ken Chase, Director Operations                  Velocet Communications Inc.
math@velocet.ca                                              Toronto CANADA
--
"Sometimes two [harmless] words, when put together, strike fear in the
  hearts of men -- Microsoft Wallet."                           - Dave Gilbert