[Date Prev]   [Date Next] [Thread Prev]   [Thread Next] [Date Index]   [Thread Index]

 

     RE: [nocol-users] Root Cause Analysis

IMHO, well trained NOC techs are a 'fair' substitute, but compared to an
actual working RCA system, is going to be more fallibe.  On the other hand,
a fabulous RCA system being run by morons with keyboards isn't going to help
either.  We're looking for a tool to assist our NOC techs in this magnitude
knowing that NOC techs are going to make mistakes, and with our SLAs, time
is of the essence.

Thank you,

Jonathan A. Zdziarski
Director - MIS
NetRail, inc.
230 Peachtree St.
Suite 1700
Atlanta, GA 30303
404-522-5400 x240


> -----Original Message-----
> From: Barry Robison [mailto:brobison-nocol@satellite.deimos.org]
> Sent: Tuesday, January 18, 2000 3:29 PM
> To: Jonathan A. Zdziarski
> Cc: nocol-users@navya.com
> Subject: Re: [nocol-users] Root Cause Analysis
>
>
> On Tue, Jan 18, 2000 at 01:38:16PM -0500, Jonathan A. Zdziarski wrote:
> > I think what everyone's goal is in this new NOCOL is not only to have
> > dependencies, but ultimately do some root cause analysis.  I'm
> still working
> > on the basic architecture of the rulesets, but in the meantime
> if some of
> > you want to contribute your own dependency architecture and any other
> > information you'd like, it will help me create a decent rules
> structure for
> > something like this.
>
> Nothing terrifies me more in a meeting when a customer asks about
> Root Cause Analysis. The Holy Grail of network monitoring: tell
> me not what happened to the network but where to fix it!
>
> I think what needs to be done to accomplish this is: intelligent
> NOC operators. The software can help some, but keeping dependancy
> information for any decent size netowrk up to date is neigh
> impossible. A good journaling / helpdesk system can help here.
> Given Problem A, in the past, causes have been C D or E.
>
> Some thresholding rules can help the operator determine what
> happened when then network goes crazy: 100 alarms in the last
> five minutes, 3 in the previous five minute chunk. Well what were
> the first few errors of those 100?
>
> Event filtering, sorting and well trained operators are the
> solution to RCA, imho.
>
> Opinions?
>
> --
> Barry Robison - brobison@deimos.org
>
> Why one contradicts.  One often contradicts an opinion when it is
> really only
> the way in which it has been presented that is unsympathetic.
>