## $Id: INSTALL,v 4.5 2000/03/21 05:23:58 vikas Exp $ INSTALLATION INSTRUCTIONS FOR 'NOCOL' v4.3 ========================================== NOTE: You will have to edit & customize some of the PERL monitors manually (in perlnocol). See the perlnocol/README for more information. NOTE: Sample config files are also provided for each monitor. These are copied over in the nocol/etc/samples directory during installation. Copy these to nocol/etc/ and edit for each monitor. 1. Plan on a location for installing the entire software. It is recommended that all required directories pertaining to NOCOL be under one directory (say /usr/local/nocol) with perhaps symbolic links to the DATA directory. 2. Run 'Configure' in the top level NOCOL source directory: sh Configure (or ./Configure) 3. Run 'make' (might want to save the output using 'make >& make.out') 4. If 'multiping' fails to compile on your system, you will have to edit the pingmon/Makefile and set IPPING to your system's 'ping' location and also comment out the 'PROGCDEFS = -DMULTIPING' line. Make sure that the output of your systems 'ping' command matches as described below (else you will need to make minor modifications in the 'pingmon/poll_sites.c' file (the area to modify is well commented so it should be easy). solar> /usr/etc/ping -s abc.foo.com 1000 5 | tail -2 5 packets transmitted, 5 packets received, 0% packet loss round-trip (ms) min/avg/max = 4/4/5 No changes are needed if you are using the provided multiping (or rpcping) programs for IP (or RPC). If you get an error 'undefined symbol _strerror()' when linking with -lresolv, edit lib/Makefile.mid and add strerror.o to OBJS 5. The default NOCOL logging port is defined in 'noclog.h' to 5354. Also, if using hostmon, the default data port is defined in the hostmon modules as port 5355. You can change these ports in these files if you want to use some other port number (preferably >1023 so that the programs do not have to run as 'root'). Then add the following lines in your '/etc/services' file (mainly for inetd- the programs use the default ports if there is no entry in the /etc/services file). noclog 5354/tcp # noclogd with TCP noclog 5354/udp # noclogd with UDP hostmon 5355/tcp # hostmon uses TCP 6. Make sure you can write to the destination directory, and then: make install su make root # to install etherload, multiping, trapmon 7. Look at the config files in the $ROOTDIR/etc/samples directory, and edit/save them to the $ROOTDIR/etc directory. List the hosts (running the monitors if distributed on various systems) which can log to 'noclogd' in the config file for noclogd. If saving to log files, make sure that the proper LOG directory exists for the log files to be created (check in noclogd-conf and log-maint). Edit all other config files for your site. PRIOR TO v4.0, IPPINGMON used the config file name of 'ipnodes'. Rename 'ipnodes' to 'ippingmon-confg'. Furthermore, the variable name for ippingmon was 'Reachability' in the previous versions, it is now "ICMP-ping" - please change any local customized scripts that used to assume the old variable name. 8. Edit the following scripts (these are run from your CRONTAB). - '$ROOTDIR/bin/keepalive_monitors' checks to see if the various monitors are running. Edit and set the values of PROGRAMS1, HOST1, etc. You can also distribute the monitors on multiple systems and share the /nocol disk via NFS. List all the monitors that you want to run per system in this file. It is run from the crontab every 30-60 minutes. - '$ROOTDIR/bin/notifier' sends email listing sites that have been critical between N to N+1 hours. You can use this program to send email to senior personnel in your staff if sites are down for more than a stipulated time (as manager's often tend to request). It is run from the crontab every hour. - '$ROOTDIR/bin/log-maint' cycles old logs and also runs the logstats program to generate statistics (it sends a HUP to the noclogd daemon). Create the mail aliases that you had selected for OPSMAIL and CRITMAIL. 9. Test 'noclogd' by starting it up in debug mode (-d). See if it complains about anything. You will have to edit noclogd-conf to set the location of the log files that are created. Check logging using the perl script 'perlnocol/testlog'. Stop 'noclogd' after testing. Then install the 'bin/crontab.nocol' file in the nocol users' crontab. (usually su nocol ; crontab crontab.nocol). This will run 'keepalive_monitors' which starts up noclogd and other monitors. If you want to run keepalive_monitors directly instead of cron for now, run it as the nocol user. Use 'netconsole -l 4' to see if any data is being collected under the DATA directory. Look in the $ROOTDIR/etc/*.error files for any errors. REMEMBER that the monitors log events to noclogd only when the state of the event CHANGES. So nothing might be logged to noclogd if all the sites remain at the same state (up/down) and threshold level. 10. You can add user 'nocol' to your password file to allow anyone to log in as user 'nocol' and see the state of the network. A typical entry is: nocol::65534:65534:Network Monitoring:/tmp:/nocol/bin/netconsole All signals are trapped by the 'netconsole' display program and cause it to terminate. 11. To install the Web interface (webnocol/) - Check the various 'SET_THIS' lines in both genweb.pl and webnocol.cgi which have been copied over into your $ROOTDIR/bin/ - Edit the &doTroubleshoot() function and check the troubleshooting commands in webnocol.cgi - Run $ROOTDIR/bin/genweb.pl from your crontab every minute: * * * * * /nocol/bin/genweb.pl >/dev/null 2>&1 - Create a link called index.html to 'Critical.html' in your web tree. - Copy over the entire 'gif' directory structure to the same directory where you are generating the html pages ($webdir in genweb.pl) - Install webnocol.cgi in your 'cgi-bin' directory. - Create a null updates file and a null cookie file owned by your web daemon cp /dev/null $ROOTDIR/etc/updates cp /dev/null $ROOTDIR/etc/webcookies chown httpd $ROOTDIR/etc/updates $ROOTDIR/etc/webcookies - Create a $ROOTDIR/etc/webusers file using the sample as an example. You can generate encrypted password using the utility script docrypt.pl 12. When you make changes to the various config files, you have to HUP the processes which will restart the 'daemons'. Note that currently there is no way to pick up only the changes in the config file,, monitoring will need to be restarted in order to pick up config file changes. PERLNOCOL --------- There is a PERL interface for developing additional NOCOL monitors. To use this, you need to have PERL installed on your system. 1. If using 'hostmon', you need to run the standalone 'hostmon-client' programs on the machines you want monitored, and run the 'hostmon' process on the 'nocol' server. Check the '@permithosts' line in the 'hostmon-client' program to ensure that it allows the nocol host to connect to the hostmon-client processes. Then copy over the entire 'perlnocol/hostmon-osclients' directory to all the Unix hosts that you want monitored. These client routines do not use nocollib.pl and do not use any configuration file. Start up hostmon-client at boot time by making an entry in your /etc/rc.local or equivalent file. As an example, you can do the following on all your Unix hosts you want monitored: cd $ROOTDIR/bin rsh host1 mkdir /usr/local/nocol rcp -r hostmon-osclients host1:/usr/local/nocol rlogin host1 # Now edit your /etc/rc.local or whatever system startup script # and add the line: # (cd /usr/local/nocol/hostmon-osclients; ./hostmon-client) # Run this command manually for now since you are not rebooting # your machine. The 'hostmon' process on the nocol host will be restarted by the 'keepalive_monitors' process. Edit the hostmon-confg file. 2. To use 'snmpmon', edit and set the thresholds in the snmpmon-confg file. List the devices that need to be monitored in the 'snmpmon-client-confg' file and run 'snmpmon-client'. SNMP data is generated in the '/tmp/snmpmon_data' directory. You can probably have a number of snmpmon-clients running on different systems and rcp the datafile over to the host running the server 'snmpmon' program periodically from cron. If you do this, then you will have to compile and edit the locations of snmpwalk and mib-v2.txt in the perl script. (A new monitor snmpgeneric can also be used instead of snmpmon. 3. If the monitor that you want to run uses 'rcisco', then enter your router's password in it and install it in nocol/bin with mode 710. Alternatively, you can use the 'tcpf.c' program to run a remote telnet command. Edit the SNMP community string in any perl script if so indicated in the perlnocol/README (if it uses snmpwalk). 4. Create the config files under $ROOTDIR/etc/. Samples are in the $ROOTDIR/etc/samples subdirectory. 5. For troubleshooting, set the $debug and $libdebug values to '1' or higher. You can also send a SIGUSR1 signal to running modules to change the debug level (increases to max and then resets on each SIGUSR1). 6. Check the size of the event_t structure (see TROUBLESHOOTING item below). 7. There is a X-window Tcl/Tk interface developed by Lydia Leong (ndaemon and tkNocol). You need 'tixwish' in order to run tkNocol. You should run ndaemon on the nocol host (this listens on TCP port 5005). You can then run 'tkNocol' from any host, and it connects to ndaemon. THERE IS NO ACCESS CONTROL in ndaemon, so you must ensure that only permitted hosts (running tkNocol) can access this host through the firewall. This can also run on Windows machines if you have tixwish installed. 8. There is a Windows 95/NT interface for viewing data developed by Jason Wright (jason@thought.net) on http://www.thought.net/jason/ TROUBLESHOOTING --------------- 1. Some warnings are to be expected, but there should be no major errors. 2. If the errors are about include files or variable types, look for the file that is being included under the /usr/include sub-directories. The various systems love to move include files back and forth between the include and the include/sys directories (especially 'time.h'). 3. For the nameserver monitor, old versions of the resolver library might complain. Some include files defined the '_res' variable differently, so try changing all occurences of '_res.nsaddr' to '_res.nsaddr_list[0]' in the src/nsmon/nsmon.c module (look in your /usr/include/resolv.h). Make sure that the 'libresolv' library exists while linking. Newer nameserver/resolver libraries are called '-lbind' instead of '-lresolv', so if you have installed the latest version of bind, change all references in the Makefile (or Configure) to '-lbind' instead of '-lresolv'. 4. For trapmon, the CMU SNMP library is used. Make sure that it was properly built under 'src/cmu-snmp/snmp'. If not, try following the instructions is cmu-snmp/README to build and install the library in the local directory. 5. Most monitors have a '-d' option for debugging, or create error files in the $ROOTDIR/etc. 6. If you get a 'h_addr_list[0]' not defined error, simply edit nocol.h and add the following line in it: #define h_addr 1 This is because of the difference in the hostent() structure of netdb.h in very old systems. 7. Check if the regular expressions in the &dotest() routine in the PERL modules need any changes for your site. 8. If you get strerror() undefined errors, try adding strerror.o to NEEDOBJS in the lib/Makefile.mid. If you get pfopen() errors in etherload on DEC OSF1, then add pfopen.o to the etherload/Makefile.mid OBJS definition. 9. In PERLNOCOL, watch out for the padding in the '$event_t' template. C compilers tend to align the fields of structures on even byte boundaries, so you might have to add some additional 'null' padding using 'x' depending on your system architecture. Set $libdebug = 1 to see the size of the $event structure. The size of the data files produced by the C monitors should be a multiple of the perl $event structure. The C utility program 'show_nocol_struct_sizes' can be used to see the event struct sizes in the C modules. 10. Check the syntax of the ping() function in the nocollib program to make sure that the command line arguments are okay for your system. Best of luck. Comments to 'nocol-info@jvnc.net' and bugs to 'vikas@navya.com' The README file has more information. For an overview, look at the file doc/nocol-overview.8. Vikas Aggarwal (vikas@navya.com) January 1997 -----------------