Java as a language was designed to make it impossible for user developed code to result in an application crash . Any error would result in a nice clean exception being thrown which could be caught, and then handled appropriately. The reality, as any long time Java developer or system administrator knows, is that the Java Virtual Machine (JVM) itself can and does crash. This is because the JVM is itself a program written in native code, and like any very large complicated program, the JVM has some bugs in it.

Like any program, when Java crashes, it simply dies and goes away along with your application. This can be disastrous if the application is a mission critical system. The application will be down, and the outage may not be noticed until a customer visits your web site, or the shipping department calls because they haven't received any shipping orders for a few hours. Once the fact that the application is down is detected, it can potentially take an hour or two longer for the system administrator to come in and restart it.

Solution

The solution to this problem is to have a monitoring application, with an intimate knowledge about how Java works. It must be trusted to monitor your Java application 24 hours a day, 365 days a year, and automatically take care of problems when they arise. The Java Service Wrapper contains several monitoring features, one of which is the ability to detect when a JVM has crashed. It constantly monitors the JVM, and is able to instantly recognize when the JVM has crashed. As soon as a crash is detected, the Java Service Wrapper automatically restarts the JVM to get the system back up and running with a minimum of delay. Finally, you can configure an email notification to be send out to the system administrator to make sure everything is back up and working normally.

The entire crash detection, recovery, and notification process is all handled by the Java Service Wrapper automatically without the need for a human operator to take action. When a crash is detected, the Java Service Wrapper will record something like the following in the log file:

Log Example Crash and Restart:
jvm 1    | ...
jvm 1    | #
jvm 1    | # A fatal error has been detected by the Java Runtime Environment:
jvm 1    | #
jvm 1    | #  EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x00000001800074ff, pid=856, tid=704
jvm 1    | #
jvm 1    | # JRE version: 6.0_24-b07
jvm 1    | # Java VM: Java HotSpot(TM) 64-Bit Server VM (19.1-b02 mixed mode windows-amd64 compressed oops)
jvm 1    | # Problematic frame:
jvm 1    | # C  [wrapper.dll+0x74ff]
jvm 1    | #
jvm 1    | # An error report file with more information is saved as:
jvm 1    | # C:\myapp\bin\hs_err_pid856.log
jvm 1    | #
jvm 1    | # If you would like to submit a bug report, please visit:
jvm 1    | #   http://java.sun.com/webapps/bugreport/crash.jsp
jvm 1    | # The crash happened outside the Java Virtual Machine in native code.
jvm 1    | # See problematic frame for where to report the bug.
jvm 1    | #
wrapper  | JVM exited unexpectedly.
wrapper  | Launching a JVM...
jvm 2    | WrapperManager: Initializing...
jvm 2    | ...

While the above message may seem a little cryptic, it is the JVM's attempt to provide you with information about the crash. Because this output is dumped to the console at a very low level by the JVM, it is impossible for any Java based logging tools to record this information. Notice that the Wrapper is able to capture 100% of the console output of the JVM and make sure that it is all preserved in the log file. Without the Wrapper, not only would your application remain down, but the explanation of what happened would also be lost.

Ideally, any crash should be fixed as soon as possible to avoid future crashes. However, because many crashes are caused by obscure interactions between perfectly valid Java code and bugs in the JVM itself, it can often take weeks or months for developers to identify and work around the problem.

The Wrapper helps you detect critical problems including:

Technical Overview

Crash detection in the Wrapper is always enabled as checking for crashes does not place any additional load on the JVM or system. As crashes either happen or they don't, there is no way to disable their detection. It is however possible to control what the Wrapper does when a crash does take place.

When a crash is detected, the Wrapper will first wait a moment (5 seconds by default) to allow system memory and resources to settle back to normal, and then after any configured notifications, launch a new JVM instance to get your application back up and running without delay. As described below, it is possible to disable this auto restart functionality if needed.

Trying it out

Hopefully you will have a hard time reproducing a JVM crash in your own Java application. To make it easy to test this out, the Wrapper ships with a TestWrapper Example Application which can be used to try out this and many other features.

Please launch the TestWrapper Application from a console by going into the downloaded and expanded Java Service Wrapper distribution. On Windows, run bin\TestWrapper.bat and on UNIX platforms run bin\testwrapper console.

When the TestWrapper Example Application starts up, you should see a simple GUI screen with several buttons down the left side. Clicking on the Native Access Violation button will cause the JVM to appear crashed to the Wrapper.

Log Example: Crash and Restart.
jvm 1    | TestWrapper: Showing dialog...
jvm 1    | WrapperManager: WARNING: Attempting to cause an access violation...
jvm 1    | #
jvm 1    | # A fatal error has been detected by the Java Runtime Environment:
jvm 1    | #
jvm 1    | #  EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x000000018000746f, pid=1632, tid=2284
jvm 1    | #
jvm 1    | # JRE version: 6.0_24-b07
jvm 1    | # Java VM: Java HotSpot(TM) 64-Bit Server VM (19.1-b02 mixed mode windows-amd64 compressed oops)
jvm 1    | # Problematic frame:
jvm 1    | # C  [wrapper.dll+0x746f]
jvm 1    | #
jvm 1    | # An error report file with more information is saved as:
jvm 1    | # C:\myapp\bin\hs_err_pid856.log
jvm 1    | #
jvm 1    | # If you would like to submit a bug report, please visit:
jvm 1    | #   http://java.sun.com/webapps/bugreport/crash.jsp
jvm 1    | # The crash happened outside the Java Virtual Machine in native code.
jvm 1    | # See problematic frame for where to report the bug.
jvm 1    | #
wrapper  | JVM exited unexpectedly.
wrapper  | Launching a JVM...
jvm 2    | WrapperManager: Initializing...

If you wish to test this out in your own Java application, it is possible to call the WrapperManager.accessViolationNative() method to cause an intentional crash.

Troubleshooting

Java's crash reports on the surface may not appear to be very useful. But when you know what to look for they can provide you with very useful information needed to resolve the problem.

In the above example, the JVM is telling us that a detailed error report was saved to disk as C:\myapp\bin\hs_err_pid856.log. Analysis of these crash reports is beyond the scope of this document, but we recommend you take a look at the Troubleshooting Guide on the Oracle site. If you really want to get your hands dirty, then this Crash course on JVM crash analysis is also quite useful.

Restart Delay

Compatibility :2.2.9
Editions :Professional EditionStandard EditionCommunity Edition
Platforms :WindowsMac OSXLinuxIBM AIXFreeBSDHP-UXSolarisIBM z/Linux

When a crash is detected, the Wrapper will by default wait for 5 seconds to allow memory and system resources which were being used by the crashed JVM to be recovered by the system, before launching a replacement JVM. If a new JVM is launched too quickly then depending on the reasons for the crash, the new JVM can run into problems or start slowly.

The amount of time it takes for the system to settle depends on a number of factors including overall system load and the size of the application that was running in the crashed JVM. While it is possible to shorten this delay, the default value has proven to be a safe amount of time to wait. The delay before restarting a new JVM can be controlled using the wrapper.restart.delay property.

Configuration Example: Waiting time restarting the JVM (5 seconds)
wrapper.restart.delay=5

Disabling Restarts

Compatibility :3.3.4
Editions :Professional EditionStandard EditionCommunity Edition
Platforms :WindowsMac OSXLinuxIBM AIXFreeBSDHP-UXSolarisIBM z/Linux

In most cases when a JVM crashes you will want it to get restarted. There are certain cases, however, where manual cleanup or other actions may be needed first. In these cases, the Wrapper's restart functionality can be disabled using the wrapper.disable_restarts.automatic property.

Configuration Example: (Disable Restart)
wrapper.disable_restarts.automatic=TRUE

Responding to Events

Compatibility :3.3.0
Editions :Professional EditionStandard Edition (Not Supported)Community Edition (Not Supported)
Platforms :WindowsMac OSXLinuxIBM AIXFreeBSDHP-UXSolarisIBM z/Linux

Sometimes when an application freezes or crashes, it is not always enough simply to restart it. If temporary files were not properly cleaned up, or the database was left in an unstable state, the new application instance may quickly fail with errors. Ideally the application should be designed to be robust enough to handle these failures, but this is not always possible. In such cases we need a way to do some cleanup before launching a new JVM.

The Wrapper supports this through the use of Event Commands. It is possible to tell the Wrapper to run an external command or script at various times.

If you look at the debug log output, you will see occasional Enqueue Event 'NNN' entries. These show the timing of when the Wrapper's events are fired. In the case of restarts after a frozen JVM was detected, we want to launch our cleanup script before the new JVM instance is launched. The jvm_restart event will be used.

Please create a simple batch file as follows and place it into the same directory as the Wrapper binary. This example simply simulates a 10 second cleanup task. A real script would be more interesting.

Sample script: "cleanupTest.bat"
@echo off
echo Cleaning up for 10 seconds...
rem use ping to cause a 10 second delay.
PING 1.1.1.1 -n 1 -w 10000 >NUL
echo All done cleaning up.

Next add the following properties to your wrapper.conf

Configuration file: "wrapper.conf"
wrapper.event.jvm_restart.command.argv.1=cleanupTest.bat
wrapper.event.jvm_restart.command.block=TRUE
wrapper.event.jvm_restart.command.loglevel=INFO

Now when a freeze is detected, the Wrapper will run the configured batch file and block until it has completed:

Log Output Example
jvm 1    | # The crash happened outside the Java Virtual Machine in native code.
jvm 1    | # See problematic frame for where to report the bug.
jvm 1    | #
wrapper  | JVM exited unexpectedly.
wrapper  | Event Command 'jvm_restart': Command line: cleanupTest.bat
wrapper  | Event Command 'jvm_restart': Command launched (pid: 965), blocking for up to 15 seconds...
Cleaning up for 10 seconds...
All done cleaning up.
wrapper  | Event Command 'jvm_restart': Command completed with exit code: 0
wrapper  | Event Command 'jvm_restart': Command completed with exit code: 0  Continuing.
wrapper  | Launching a JVM...

Email Notification

Compatibility :3.3.0
Editions :Professional EditionStandard Edition (Not Supported)Community Edition (Not Supported)
Platforms :WindowsMac OSXLinuxIBM AIXFreeBSDHP-UXSolarisIBM z/Linux

In addition to the ability to launch external commands in response to a freeze event, it is also possible to configure the Wrapper to send out a notification email. The email can take the form of a simple notice, or optionally include the most recent log output as an attachment.

An email notification of the jvm_unexpected_exit event can be configured as follows:

Configuration file "wrapper.conf"
wrapper.event.default.email.smtp.host=mail.example.com
wrapper.event.default.email.sender=wrapper-app@example.com
wrapper.event.default.email.recipient=sysadmin@example.com
wrapper.event.jvm_unexpected_exit.email=TRUE
wrapper.event.jvm_unexpected_exit.email.subject=Oh no! The application crashed!
wrapper.event.jvm_unexpected_exit.email.maillog=ATTACHMENT

This will cause an email like the following to be sent whenever the JVM crashes.

jvm_unexpected_exit email
Subject: Oh no! The application crashed!.
To: sysadmin@example.com
From: wrapper-app@example.com

Java Service Wrapper Event Notification

Host: myserver
App Name: testwrapper
         TestWrapper Example Application

Event: jvm_unexpected_exit

INFO   | jvm 1    | 2012/09/13 16:00:00 | ...
INFO   | jvm 1    | 2012/09/13 16:00:00 | # The crash happened outside the Java Virtual Machine in native code.
INFO   | jvm 1    | 2012/09/13 16:00:00 | # See problematic frame for where to report the bug.
INFO   | jvm 1    | 2012/09/13 16:00:00 | #
ERROR  | wrapper  | 2012/09/13 16:00:00 | JVM exited unexpectedly.

But it is also good to get good news. The following configuration will also tell the Wrapper to send out a second email whenever the JVM has been launched and your application is back up and running. This pair of emails can be very useful in keeping you informed of the state of your server so you can decide whether or not its ok to relax and enjoy the rest of dinner.

Configuration file "wrapper.conf"
wrapper.event.jvm_started.email=TRUE
wrapper.event.jvm_started.email.subject=Good news!  The applicaiton is back up.
wrapper.event.jvm_started.email.maillog=ATTACHMENT

The email and event systems of the Wrapper are very powerful and configurable. Please see the Event Overview page for more details.

Service Recovery

Compatibility :3.5.5
Editions :Professional EditionStandard EditionCommunity Edition (Not Supported)
Platforms :WindowsMac OSX (Not Supported)Linux (Not Supported)IBM AIX (Not Supported)FreeBSD (Not Supported)HP-UX (Not Supported)Solaris (Not Supported)IBM z/Linux (Not Supported)

The Wrapper is responsible and fully capable of handling any Java crashes. While the Wrapper process itself has proven to be very stable, this section covers how to recover from a crash of the Wrapper process itself.

The Windows Service Manager is responsible for starting, stopping, and monitoring all Windows services, including the Wrapper. It is possible to configure the Wrapper to tell the Windows Service Manager what to do in the event of a Wrapper crash. These recovery features can be configured using the properties available in the Windows Service Recovery section. Please see a step by step procedure for an example of the wrapper.ntservice.recovery.<x> property.

Reference: Crashes

The Java Service Wrapper provides a full set of configuration properties that allows you to make the Wrapper meet your exact needs. Please take a look at the documentation for the individual properties to see all of the possibilities beyond the examples shown above.