Crash Recovery for Unattended Computers
We increasingly rely on computers set up as servers to provide us web pages, software downloads, and other files - all at any time of the day or night. We also rely on computers to perform our mundane tasks such as data processing or mathematical calculations. And we expect our computers set up for these specialized services to do their jobs unattended, with minimal intervention.
But an unattended computer can mean unresolved problems. Even the most uncomplicated and well-tested software is not completely immune to crashing. And a crash on an unattended computer, remote server, or public kiosk can be more than a minor inconvenience.
Sophisticated Circuits, Inc. is a leading provider of innovative hardware and software products that can monitor the unattended computer, and can respond to failures -- automatically and independently.
- Kick-off! is for computers with USB systems. This includes Apple iMacs and G4 models some USB PowerBooks, as well as Windows compatible desktop computers with USB ports running Windows 98, ME or 2000.
- Rebound! works specifically with desktop ADB PowerMacs, including the Blue & White G3 model.
- PowerKey Pro Model 200 and Model 600, with the Server Restart Option, work with any desktop ADB Macintosh and some ADB PowerBooks.
- PowerKey Pro Model 650 , Admin version, works with any desktop USB Macintosh and some USB PowerBooks.
Each of these highly integrated crash detection and recovery devices acts like a full-time sentry, watching the computer to make sure it keeps running and taking over to restart the system when necessary.
Types of Crashes
|When the computer is running normally, the system software and applications run in parallel. The system software takes care of the low-level bookkeeping and manages resources needed by the applications. The applications communicate with the system (as indicated by the yellow arrows in the diagram), asking for resources and sharing processing time.|
Thus, there are two types of crashes which can afflict a computer, system crashes and application crashes.
The system crash occurs at a very low level, causing the entire system to come to a halt. Since the applications rely on the system for needed resources, they too stop running. These crashes often cause the dreaded 'blue screen', "system error" bomb dialog box, or the system, including the mouse and keyboard, may simply "freeze".
When crashed, the system does not respond to software commands and the Restart command cannot be executed. Recovery requires external intervention, either by switching the computer off and on to reset it or, on a limited number of Macs, by pressing the command-control-power-on-key "reset" keystroke.
The application crash occurs within a single process. This can cause an "application error" or "unexpectedly quit" dialog box, or the system may appear to stop responding. Often this crash will not affect the rest of the computer, and the system and other applications can still be running, especially applications which normally run in the background.
When crashed, parts of the system may continue to respond to software commands. Recovery can sometimes occur by executing the Restart command, although in some cases, the system may be damaged to the extent that an external restart may be required.
Detecting System Crashes
Kick-off!, Rebound! and PowerKey Pro use a patented combination of hardware and software to detect and recover from system crashes. (For automatic crash detection, USB PowerKey Pro must have the Rebound! upgrade and ADB PowerKey Pro must have the Server Restart Option [SRO].)
To detect system crashes, the software periodically sets (or "tickles") an internal system timer in the hardware, as indicated by the purple arrow in the diagram. While the system is running normally, the software continuously maintains this communication.
The hardware's system timer runs independently, always counting down from the value set by the software. If the system crashes, the software will no longer be able to update the timer, and the timer will continue to count down. When it reaches zero, the hardware will decide the computer has crashed, which will trigger the hardware to restart the computer.
Recovering From System Crashes
Software settings determine how long the hardware will wait before deciding the computer has crashed, and what steps to take when it does. The most common response is simply to restart the computer. Each product has different restarting capabilities:
- Kick-off! turns off the power to the computer, then turns it back on after a brief pause.
- Rebound! sends the "command-control-power-on-key" sequence to the Mac's ADB port. This technique works reliably on some models of ADB Macintosh, specifically Power Macs ranging from the 6100 to the blue and white G3.
- ADB PowerKey Pro can use the keyboard restart command like Rebound!, but it can also restart the computer by switching the computer's outlet off, then on after a brief pause.
- USB PowerKey Pro turns off the power to the computer's outlet, then turns it back on after a brief pause.
Kick-off!, Rebound! and the Admin version of USB PowerKey Pro can try multiple times to restart the system. ADB models of PowerKey Pro can be programmed to perform additional actions, such as switching other outlets or launching applications or AppleScripts, in response to a crash.
Detecting Application Crashes
When an application crashes, other tasks can continue without interruption, so no system crash will be detected. To detect application problems, our software includes special Application Timers, or AppTimers, which can receive status information with many popular server applications, in a manner very similar to our system crash detection.
Applications with this support periodically update or "tickle" their own AppTimer within the Kick-off!, Rebound! or PowerKey software at regular intervals while running normally, as indicated with the pink arrow. If the application crashes, it fails to update its AppTimer, and when it expires, the software is then triggered to react.
Recovering From Application Crashes
To take advantage of the custom application crash recovery system, the application must be specifically written to include support for our software. Some applications use a plug-in, such as WebSTAR, while other support is completely automatic and transparent, as in AppleShare IP 6.x. See the list of third-party support to learn how these, and other, programs support Sophisticated Circuits' products.
The Kick-off!, Rebound! or PowerKey Pro must be set up to respond when a monitored application crashes. Kick-off! and Rebound! use simple check boxes in the Application Crashes panel of their Control Panels. With ADB PowerKey Pro, an event must be created which combines the Trigger "When Timer Expires" with the Action "Restart".
Direct application support works together with the System Crash Detection to provide double defense against crashes. If the system Restart command fails, the hardware can take over and restart the computer.
Scripting the AppTimer on Macintosh Computers
Using AppleScript, custom Macintosh software, such as databases and interactive presentations, can communicate directly with the Kick-off!, Rebound! or PowerKey software. You can set the AppTimer to a specific number of seconds, as in the AppleScript example below:
tell application "PowerKey Extension" set appTimer to 300 end tell
This sample script sets the AppTimer in PowerKey to 300 seconds (5 minutes), after which it will immediately begin counting down. You can then program the application to send this script every 60 seconds; while the application is running normally, the AppTimer is reset every minute to the full 300 seconds. If the application fails to repeat the AppleScript within the allotted time, then the AppTimer will count down to zero, triggering the Kick-off!, Rebound! or PowerKey software to react.
The application should ping the Extension no more often than once every 30 to 60 seconds, to keep system overhead down. Also, the AppTimer should be set to a value significantly higher than the "pinging" frequency, so it won't reach zero if the system is busy with other tasks.
This feature may also be triggered with an AppleEvent. Event class and event ID are defined in the user manual for each product.
Adding Direct Support for AppTimers
Apple developers may add direct support to their products for Kick-off!, Rebound! and PowerKey's custom application monitoring by using our Software Developer's Kit. Windows developers will find the Kick-off! sdk information in the Kick-off! for Windows software.