

Das_Watchdog V0.3.1
Released 5.1.2006

Kjetil S. Matheussen, k.s.matheussen@notam02.no


ABOUT
-----
Das_Watchdog is a general watchdog for the linux operating system that
should be run in the background at all times to ensure a realtime process
won't hang the machine.

Das_Watchdog is inspired by the rt_watchdog program
made by Florian Schmidt: http://tapas.affenbande.org/?page_id=38

But das_watchdog has some improvements over rt_watchdog:

1. It works with 2.4 kernels as well as 2.6.
2. Instead of permanently setting all realtime processes to run non-realtime, das_watchdog
   only sets them temporary.
3. When the watchdog kicks in, an X window should pop up that tells you whats happening.
   (just close it after reading the message).



INSTALLING
----------
make
cp das_watchdog /usr/local/sbin/
echo '/usr/local/sbin/das_watchdog >/dev/null &' >>/etc/rc.sysinit
reboot



USAGE
-----

Whenever a program locks up the machine, the watchdog temporarily sets all
realtime process to non-realtime for 8 seconds. You will get an xmessage window
up on the screen whenever that happens.

To test it, run the attached program "test_rt", which immediatley freezes your
machine. However, a window should pop up after about 5-6 seconds telling you
that the watchdog set the process to non-realtime. (If you have two processors,
you must run test_rt two times, and so on.)

Unless you are using the High Res Timer, use the "--force" option to set the
priority of all timer processes to SCHED_FIFO/99. If you are using the High Res Timer,
_don't_ set the priority of the timer process to SCHED_FIFO/99. Doing that cause
xruns, at least for me, probably also when not using the High Res timer.

To summarize: Use the High Res Timer. Its currently the only way (as far as I know)
to avoid xruns and have proper timing in the 2.6 kernel.

If the xmessage window does not show up, it can be because
the user logged into the machine has the home area placed on a non-root mounted disk.
When that happens, root is unable to read the users .Xauthority file. Unfortunately,
I have no (good) solution for that situation. But if thats not the case, please
report the problem to me.

NOTE! The kernel changes all the time, and the "--force" option changes
the priorities of your processes in a way its not ment to. So, you should be absolutely sure
before resorting to using the "--force" option. And the only way to be sure about that
is to run the "test_rt" program. If you run das_watchdog without --force (even
though it complains during startup), DON'T use the "--force" option if das_watchdog
stops the "test_rt" program from taking over the machine. Then everything is okey!




REQUIREMENTS
------------
xmessage (should be a part of X11)
libgtop2 (should be included with gnome. No, das_watchdog is not a gnome-program.)



CHANGES
-------
0.2.5->0.3.1
*Changed scheme for finding correct XAUTHORITY environment variable.
  (Now works with Fedora Core 6)
  Hopefully, theses changes should increase the chance of seeing the
  xmessage and avoid seeing multiple ones. (Theres no correct way to do
  this, so please send me the output of "uname -a" in case you don't see
  any window)
*Added syslogging.
*Added the --version argument.


0.2.4->0.2.5
*Let the test thread run with SCHED_FIFO priority as well, using the lowest priority.

0.2.3->0.2.4
*Test if the xmessage program found during the make process is a valid executable.
 If not, search the $PATH instead. This should fix it for Gentoo when the pro-audio
 overlay is updated to at least this version.
*Various modifications for the High Res Timer, which should be used instead of setting the
 timer interrupt process to SCHED_FIFO/99.

0.2.2->0.2.3
*Fixed commandline arguments for increasetime, checktime and waittime.
*Nicified source a bit

0.2.1->0.2.2
*Locked down memory. Don't know if its necessary.

0.2.0->0.2.1
*Cleaned up source a bit.
*Properly find number of timer processes.
*Added shortcuts for optargs and beautified the source a bit.


0.1.2->0.2.0
*Don't do anything if no process priorities are changed, when watchdogging.
*Added the --force option, that sets the priority of all timer processes to FIFO/99.
*Added the das_watchdog /etc/init.d script provided by Stefan Kersten. (das_watchdog.rc)
*Added the --verbose option.
*Check that its the same process when setting back old priority.
*Don't set back to old priority if the priority has been changed in the mean time.
*Added options for setting increasetime, checktime and waittime.
 (--increasetime, --checktime and --waittime)
*Don't change the priority of any timer process when watchdogging.
*Smaller code cleanups.


0.1.1->0.1.2
* Added check for the ksoftrqd/0 process as well as the softirq-timer/0 process.
* Added check for SCHED_OTHER of the timing process as well as priority.
* Removed debug-printing.

0.1.0->0.1.1
* Added extensive checks both when compiling and when running about the priority of the "softirq-timer/0"
  process:
  - ***If "softirq-timer/0" is not set to a very high priority (99), the watchdog most probably will not work.***
  - The default priority for softirq-timer/0 seems to be 1. However, for real time work, it must be
    set higher to get reliable timing. Set it to 99.
  - If softirq-timer/0 is set to less than 99, das_watchdog will refuse to compile unless you force it to by
    editing the makefile. When running das_watchdog, it will only give a warning if the priority is set too low.
* Changed the DISPLAY environment variable to ":0.0" instead of "localhost:0.0". Seems to work for
  everyone now.
* Switched from libgtop to libgtop2.

0.0.2->0.1.0
* Properly setting the DISPLAY and XAUTHORITY environment variables in various ways to make sure
  the message is really shown. (It really works now!)

0.0.1->0.0.2
*Use xmessage instead of wish. (much nicer)
*Run system("xhost +") and setenv("DISPLAY",":0.0",1) before running xmessage.



ACKNOWLEDGEMENT
---------------
The program is mentally based on Florian Schmidts program rt_watcdog. Florian Schmidt
also wrote the test_rt program.

