[Date Prev] [Date Next] | [Thread Prev] [Thread Next] | [Date Index] [Thread Index] |
Re: [snips-users] hostmon falling over / snipstv/eventselect segmentation fault:
|
On Mon, Jul 24, 2006 at 01:32:35PM -0700, Russell Van Tassell wrote: > I've not tried going 64-bit on NOCOL, however, though I've been building > a new 64-bit Solaris machine... I might be inclined to check it out, > though. > > (Apologies for the delayed reply here... life has kind of "taken over" > recently) Well, the problem seems to be (amongst others) anything that tries to use the SNIPS perl module pack_event() will fall over with a segmentation fault... but not always, so I think it may be something to do with monitors that use subdevices. Despite a colleague hacking the code to compile and work on 64bit linux, We've been unable to fix this problem so ended up migrating everything from hostmon to snmpgeneric which does not use pack_event(). As our monitoring is made up of several machines there are other problems. The SNIPS database files, and RRD files generated on 32bit machines are not readable by the 64bit machine. I haven't yet got round to fixing this, but it would involve exporting the database files as text on the 32bit boxes as text with display_snips_datafile / rrdtool dump and then re-import on the 64bit box. Unfortunately I can't get my re-import script for SNIPS data to work on either the 32bit machines or the 64bit machines. Maybe something I'm doing wrong. The SNIPS Perl API seems a bit vaguely documented in places, and doesn't give any good examples of how to use pack_event correctly. (i.e., what you need to present it with and how, in which order.) If I can't get this to work then I will have to rewrite some stuff (in perl) that uses the SNIPS data files to use the text files instead (snipstv and snipsweb). This is easy enough to do I suppose, just takes a bit of time. On the day of the migration I discovered that snipslogd on the central host did not seem to work remotely. It works fine receiving events from monitors locally, but does not receive events from the remote boxes. I can see the UDP packets arrive, but snipslogd doesn't do anything with them, but logs errors about: "readevent: socket read failed (incomplete)--" The workaround to this was to configure the remote boxes to use syslog as a transport, by running snipslogd on each remote box, and configuring everything to log to that local snipslogd, and then configure snipslogd to pipe everything to logger: * info |/usr/bin/logger^-t^snipslogd^-p^local3.info I then configured syslog-ng to ship those events back to the central host, and tweak my event processing script to strip off the syslog date and time bits before trying to parse as a SNIPS event. It turns out that this is also slightly more robust since I set it up to use TCP and not UDP, and it allows me to keep a local log on each server as well as log to multiple destinations. (Something that you couldn't really do with snipslogd) Longer term I will probably need to replace SNIPS with something else (as we will shortly have more requirements like IPv6 Monitoring and more precise latency/jitter/packet loss monitoring. It may be that I can come up with a replacement "multiping" wrapper script that makes this work. Unfortunately every alternative out there seems to involve a huge bloaty system that back-ends on mysql or some other database, uses php/java or is yet another open monitoring framework which may or may not be developed to a usable point, or just be too complicated for me to understand. Network engineers are not usually also software developers, so I can manage to hack together something in perl that works, but some of these new montioring frameworks require advanced knowledge of java, python, php or some other wacky script language. The nice thing about snips was that it's lightweight and it just worked (except for a few foibles, most of which have been corrected.) Maybe I'm getting too old for this lark, I should let the bright new kids have a go, say I'm old-fashioned with my rickety perl scripts, and not listen to me, and then repeat all the same mistakes I made in the past. :) Regards, Rob -- Robert Lister - London Internet Exchange - http://www.linx.net/ robl at linx net - tel: +44 (0)20 7645 3510 - RL786-RIPE |