[Date Prev] [Date Next] | [Thread Prev] [Thread Next] | [Date Index] [Thread Index] |
nocol-users mozbot talks to nocol
|
Well, mozbot listens to nocol anyway. ;) Good timing on releasing the SQL status database, JUST as I finish coding, d'oh! At any rate, my company hangs out in irc alot, so that people dont have to yell from office to office, and when people are at remote sites or telecommuting, we can maintain conversations without having people interupting eachother with phone calls. We even have some savvy customers trained to bother us in a client channel too. ;) Of course we have our own irc servers. We sometimes dont notice that some nocol state is changing before we get a huge number of pages. Even so, our paging system (the telco) can be unreliable. I want to ensure we do NOT miss any alert statuses in nocol. And in fact, I want a subset of nocol states to be flagged in some way without resorting to the cumbersome and intrusive pager system. If someone is around on the irc, they can go fix it, when its in error or warning state, because its not worth paging someone who's out visiting relatives on the weekend, though it should eventually be fixed. Email is also unreliable in terms of WHEN people read it, as well as their powers of ignoring important email. ;) So for us, IRC seems like a good place to give alerts on things that we consider not ever worthy of a critical flag. (A good example is diskspace. We consider 98-99% to be critical on most our systems because they have 20-50 gigs on most of them now. Having 'only' 1 gig free is not an emergency. So its not worth it to page someone because disk space is in error state. But its worth it to indicate it in irc. I am still tweaking it a bit, and will be done in a day or two, and will be handing out urls then. Just wondering what else people want added to this while Im at it. Right now I have this coded and working: - reporting changes to any state (from I to W/E/C or back to I) - allowing user to request output of any state to channel or back in private /msg (I do not allow a full INFO report back in channels) I am going to tweak this a bit to allow a simple limit - even I can see now how much state fluctuates - I NEVER noticed this before. Its kinda interesting - but its alot of traffic in some ways, with every warning popping up and disappearing another 2-3 minutes later. I can make a limit so that only Errors or full Critical status are reported. I am trying to get the exception stuff working a bit better now, to allow the bot to report on and insert exceptions much as the web interface does. However, the web interface exceptions do NOT seem to be sticking for me (on subsequent reload they're gone). I think there's a permission problem writing to disk, but the exceptions file is owned by the same user that owns the apache process. Strange. I am also going to allow the user to request a report of all states (below a certain level) from a specific machine, or all states (below a certain level) of a trait ('varname') from all machines. Or one specific machine's state for a certain trait. The one last thing I think we need is to allow a nag config to be setup so that a periodic nagging over an existing state can be reiterated again (currently the bot only volonteers info on state changes). I can also allow (a) target user(s) to be nagged specifically about various problems. A variation of that is allowing different levels of reporting for different targets: changes to/from "critical" only to the channel, and error to all the admins via private /msg, and 'warning' to the jr admins only via /msg. Hmm this could come in VERY handy. (/me grabs his whip with the jr admin's name on it.) Mozbot is from the mozilla project, and I've neutered a bunch of its functions (the tinderbox compilation engine reporting for eg) since they're specific to the Mozilla groups' desires. However, it maintains a few interesting things such as reporting slashdot and freshmeat headlines as they appear, et al. (These can be removed easily as well.) The next 'version' of this may see this bot cleaned up MASSIVELY, because the original code was a travesty. :) As well, I will be interfacing with the new SQL database to get a report of state once I get that working. Here's a sample of my current output: <m:#baz> hadrian.velocet NetOErr=7 chg:I from W (since 12/21 16:04) <m:#baz> water.planeteer NetColl=0 chg:I from W (since 12/21 16:14) (oh the output string can be configured as well) /kc -- Ken Chase, Director Operations Velocet Communications Inc. math@velocet.ca Toronto CANADA -- "Sometimes two [harmless] words, when put together, strike fear in the hearts of men -- Microsoft Wallet." - Dave Gilbert |