rt_watchdog

rt_watchdog is a simple software that tries to prevent your system from runaway SCHED_FIFO tasks. For this it installs two threads. One with RTprio of 1 and the other with 99. The low prio thread writes into a ringbuffer and the high prio thread simply checks in bigger intervals whether there’s any data. When no data is available, the high prio thread assumes the system is locked and runs the unfifo_stuff.sh script and writes a note into the syslog. The script simply makes every task in the system (except for kernel threads) SCHED_OTHER, which should enable the user to cleanly shutdown his machine..

Download:

Grab it here

Issues:

Some versions of the realtime preemption kernels didn’t get the system timer stuff right. Thus when a low prio thread was hugging the cpu, the high prio thread never returned from its sleep -> bad! This seems to be fixed in newer versions.

Update: the reason for this is that the sleep() stuff is handled by a kernel thread called softirq-timer/0. This one needs to have a high SCHED_FIFO priority to be able to wake sleeping threads reliable. Make it priority 99.

Update: It seems this is only true for kernels that have high res timers disabled. So enable high res timers and stuff should work without setting softirq-timer/0’s priority manually. It might actually hurt.

Screenshot:

Here you can see rt_watchdog showing in the output of htop (which is a very cool top replacement). Note that there’s also a “watchdog/0″ process. That is the kernels soft lockup detection which works a bit differently.

showing in htop

Leave a Reply