[Date Prev]   [Date Next] [Thread Prev]   [Thread Next] [Date Index]   [Thread Index]

 

     Re: [nocol-users] Root Cause Analysis

On Tue, Jan 18, 2000 at 01:38:16PM -0500, Jonathan A. Zdziarski wrote:
> I think what everyone's goal is in this new NOCOL is not only to have
> dependencies, but ultimately do some root cause analysis.  I'm still working
> on the basic architecture of the rulesets, but in the meantime if some of
> you want to contribute your own dependency architecture and any other
> information you'd like, it will help me create a decent rules structure for
> something like this.

Nothing terrifies me more in a meeting when a customer asks about Root Cause Analysis. The Holy Grail of network monitoring: tell me not what happened to the network but where to fix it!

I think what needs to be done to accomplish this is: intelligent NOC operators. The software can help some, but keeping dependancy information for any decent size netowrk up to date is neigh impossible. A good journaling / helpdesk system can help here. Given Problem A, in the past, causes have been C D or E.

Some thresholding rules can help the operator determine what happened when then network goes crazy: 100 alarms in the last five minutes, 3 in the previous five minute chunk. Well what were the first few errors of those 100?

Event filtering, sorting and well trained operators are the solution to RCA, imho.

Opinions?

-- 
Barry Robison - brobison@deimos.org

Why one contradicts.  One often contradicts an opinion when it is really only
the way in which it has been presented that is unsympathetic.