[Date Prev]   [Date Next] [Thread Prev]   [Thread Next] [Date Index]   [Thread Index]

 

     [snips-users] "ERROR: snips_rrd() empty devicename '' or variable ''" fixed (long)

Hello,

I and another SNIPS user, Todd Edmands, have found the cause of the
"ERROR: snips_rrd() empty devicename '' or variable ''" message.
We implemented a fix 6 days ago and haven't had the problem return.
Before the fix I generally never went more than 5 hours or so
before it returned.  Todd's system got the error less frequently,
approximately weekly or less, so he is still testing.

The problem occurs when a monitored variable gets expired for
not being updated for longer than the EXPIRE_AGE time and then gets
updated later.  (Because a host returned, or a temporary variable
like a MailQDest with a particular destination returns.)

The expiration code actually alters the "event" for a monitored
variable, overwriting the record in hostmon-output with a null
(empty) event.  But the index for that variable is kept.  If this
variable later returns, the same record for this event is used
again, but the nulled information is not rewritten.  Hence the
empty devicename (and other information).

In the code for handling an old data event (which isn't old enough
to have expired yet), age is handled by setting the n_OLDDATA flag
in the event record.  This flag is cleared later if the event is
updated before expiration.

It turns out that there is an unused state flag named n_NODISPLAY.
This flag is never set in any of the C nor Perl code, though it is
checked at one point in the snipstv code and used to ignore an event.
Since this is reminiscent of the behavior we want, we adopted this
state flag for expiration.  By using this flag, there is no need to
alter the event with the null values.

Changes to hostmon:
-------------------
The new expiration code is copied from the old data code that follows
it and modified to use the n_NODISPLAY variable.  The hash, %oldage,
is reused to indicate an expired event by setting it's value to 2.

The code which unsets the n_OLDDATA flag has an additional test
added to also unset the n_NODISPLAY flag.

Changes to genweb.cgi:
----------------------
The test for expired row data originally checked for empty device
name and device address.

We added a test for the n_NODISPLAY flag, leaving the original test in
case there might be other causes of empty device names and addresses.

Changes to snipstv:
-------------------
None were necessary.  The existing test for the value of the
n_NODISPLAY flag in snipstv does exactly what we wanted--skips
displaying an expired event.

Context diffs
-------------
=============Cut here============================================
*** hostmon	2001-09-24 09:32:22.000000000 -0600
--- hostmon.new	2003-12-11 12:54:42.000000000 -0700
***************
*** 393,399 ****
      $timestamp{$item} = 0 if (! defined($timestamp{$item}) );
      my $age = $curtime - $timestamp{$item};
      # print STDERR "Age for $item is $age secs\n";
!     if ($age >= $EXPIRE_AGE) { rewrite_event($datafd, $nullevent); next; }
      if ($age >= $OLD_AGE) {
        if (! defined ($oldage{$item})) {
  	my %event = unpack_event($event);
--- 393,414 ----
      $timestamp{$item} = 0 if (! defined($timestamp{$item}) );
      my $age = $curtime - $timestamp{$item};
      # print STDERR "Age for $item is $age secs\n";
! 
!     # Previous code used alter_event to blank fields in the record for
!     # this event.  This code uses the NODIPLAY flag in the state field
!     # of the event instead.
!     if ($age >= $EXPIRE_AGE) {
!       if (! defined ($oldage{$item}) || $oldage{$item} < 2) {
! 	my %event = unpack_event($event);
! 	$event{state} = $event{state} | $n_NODISPLAY;
! 	$event = pack_event(%event);
! 	$oldage{$item} = 2;
!       }
!       my ($status,$value,$thres,$maxseverity) = split(/\t/, $curstat{$item});
!       update_event($event, 0, $value, $thres, $maxseverity);# escalate severity
!       rewrite_event($datafd, $event);
!       next;
!     }	# age > $EXPIRE_AGE
      if ($age >= $OLD_AGE) {
        if (! defined ($oldage{$item})) {
  	my %event = unpack_event($event);
***************
*** 409,414 ****
--- 424,432 ----
      if (defined $oldage{$item}) {
        my %event = unpack_event($event);
        $event{state} = $event{state} & (~$n_OLDDATA);
+       if ($oldage{$item} == 2) {
+         $event{state} = $event{state} & (~$n_NODISPLAY);
+       }
        $event = pack_event(%event);
        undef $oldage{$item};
      }
=============Cut here============================================
*** genweb.cgi	2002-01-29 22:42:45.000000000 -0700
--- genweb.cgi.new	2003-12-12 15:16:08.000000000 -0700
***************
*** 542,548 ****
    while ( ($event = read_event($datafd)) ) {
  
      my %ev = unpack_event($event);
!     next if ($ev{device_name} eq "" && $ev{device_addr} eq "");
  
      $ev{file}=$file;	# store the filename also
      
--- 542,549 ----
    while ( ($event = read_event($datafd)) ) {
  
      my %ev = unpack_event($event);
!     next if ($ev{device_name} eq "" && $ev{device_addr} eq "")
!          || ($ev{state} & $n_NODISPLAY);
  
      $ev{file}=$file;	# store the filename also
      
=============Cut here============================================

-- 
Anthony Vealé
National Snow and Ice Data Center
E-Mail: veale at nsidc org
Phone: (303)735-5069

Zyrion Traverse Network Monitoring & Network Management Software