Thread Deadlocks are a possibility in any multi-threaded Java application. They have been a problem since the first multi-threaded applications were implemented. For example: An application can go through a full test cycle, be released, go through additional testing, and then finally be deployed for mission critical use. As the application's usage grows, some critical component suddenly stops responding. The system administrator's only recourse is to kill and restart the application to get it working again.

What usually follows is that the users of the application complain to the system administrator about the inconvenience, loss of sales, or other damages caused by the downtime. The developers of the application are notified that there was an application freeze and told to immediately resolve it. Unfortunately, there is usually very little, if any, information as to what caused the application to freeze. Developers need to try and resolve the problem without any clues as to what caused it, or what the problem even is. This often leads to a "try it again and see what happens" kind of response, which is never good for anyone involved.

When a Deadlock is detected by the Wrapper, it will first log a detailed report of exactly which threads and which objects were involved. It can then immediately restart the JVM to make sure that your Java Application is back up and running with a minimum of downtime. This means that not only will your Java application remain operational, but all of the information required to report the problem, so it can be fixed, will also be available. Deadlock

What is a Deadlock?

A deadlock takes place when two or more threads in a program get stuck waiting to access an object which will never become available.

Imagine two workers Tom and Fred. They both write notes for a living. To do so, they need to pick up a pad of paper and a pen from a desk, write the note and then put both the pad and pen back on the desk. Both Tom and Fred are very stubborn. Once they start, they will never put the pad or pen back on the table until the note has been written.

What will happen if Tom picks up the paper, and Fred picks up the pen at the same time? Tom is going to wait forever for the pen, and Fred is going to wait forever for the paper. They are now in a deadlocked situation. There is no way for either to proceed because neither one of them will ever give in.

With Tom and Fred, one of them will eventually get tired and put what they have back on the desk. But programs don't work that way. Two threads that have gotten them into such a "deadlock" state will continue to wait until well into the night. At some point their manager is going to notice that no notes are being taken and there will be trouble.

What makes this kind of deadlock problem so difficult to reproduce and fix is that it is simply a matter of timing. Tom and Fred could have worked together for years without any problems simply because they very rarely needed to write down a note at the same time. When they are asked to take several notes a minute however, the problem is encountered quite quickly.

How do we solve this dilemma?

The solution is to tell Tom and Fred that when they take a note, they must always try to pick up the pad of paper before the pen. There will still be cases where one of them still needs to wait a moment for the other to put the paper or pen back on the desk, but they will never get stuck.

Solution

In this example with our two workers, the problem is very clear and can be easily resolved. In a very large application software, involving dozens of resources and tens of thousands of lines of code, it can be very difficult to identify, let alone resolve the problem. Real deadlocks can sometimes involve several resources and threads which are used in combinations that were not anticipated by the system developers.

Deadlocks by their very nature are much more likely to occur with live data than in a testing environment because live data tends to have a wider variety and larger volumes. Like with the workers, when there are few messages to record, the system works fine. But when things get busy, it becomes more and more likely that there will be a problem.

Asking both of the workers to follow the same lists of tasks seems obvious. But in reality, large systems are designed by multiple developers, each creating their own list of tasks. Any one set of operations is fine on its own, but can cause problems when used together.

While the Java Service Wrapper is not able to prevent an application level Deadlock from taking place, the Wrapper contains advanced Deadlock detection features which allows the problem to be detected and resolved before a human is likely to even notice that anything is wrong. At the same time, the Wrapper collects and logs a detailed description of exactly what happened. This makes it much easier for a developer to understand and quickly fix the source of the problem.

The Wrapper has the ability to monitor an entire running application with virtually no performance penalty. When it detects a deadlock in the application, it will produce a report in the Wrapper's log file like the following:

Log Example on a Deadlock:
WrapperManager Error: Found 2 deadlocked threads!
WrapperManager Error: =============================
WrapperManager Error: "Worker-1" tid=18
WrapperManager Error:   java.lang.Thread.State: BLOCKED
WrapperManager Error:     at com.example.Worker1.pickUpPaper(Worker1.java:64)
WrapperManager Error:       - waiting on <0x000000002fcac6db> (a com.example.Paper) owned by "Worker-2" tid=17
WrapperManager Error:     at com.example.Worker1.pickUpPen(Worker1.java:83)
WrapperManager Error:       - locked <0x0000000029c56c60> (a com.example.Pen)
WrapperManager Error:     at com.example.Worker1.logMessage(Worker1.java:22)
WrapperManager Error:     at com.example.Worker1.run(Worker1.java:42)
WrapperManager Error:
WrapperManager Error: "Worker-2" tid=17
WrapperManager Error:   java.lang.Thread.State: BLOCKED
WrapperManager Error:     at com.example.Worker2.pickUpPen(Worker2.java:83)
WrapperManager Error:       - waiting on <0x0000000029c56c60> (a com.example.Pen) owned by "Worker-1" tid=18
WrapperManager Error:     at com.example.Worker2.pickUpPaper(Worker2.java:64)
WrapperManager Error:       - locked <0x000000002fcac6db> (a com.example.Paper)
WrapperManager Error:     at com.example.Worker2.logMessage(Worker2.java:22)
WrapperManager Error:     at com.example.Worker2.run(Worker2.java:42)
WrapperManager Error:
WrapperManager Error: =============================

After the Deadlock has been detected, the Wrapper can be configured to take one of many actions. In most cases, sending a notification email and then restarting the application is the best course of action. The email, containing the above output will make it fairly easy for a developer to fix the problem. And the restart will help reduce the impact of the problem on users by getting the application back up and running without any unnecessary delay.

From the report, it is very easy to see that the deadlock was caused by the "Worker-1" and "Worker-2" threads, and see exactly where in their respective call stacks they got stuck. The output even makes it clear which specific object instances are at fault.

On top of everything else, the JVM will have been restarted automatically, meaning that the impact on users was limited to the transactions involved and a few moments of downtime.

The Wrapper helps you detect critical problems including:

Technical Solution

Adding the ability to detect deadlocks with the Java Service Wrapper is as easy as adding a few configuration properties to your Wrapper configuration file.

The TestWrapper example application that ships with the Java Service Wrapper has this functionality enabled by default. Simply launch the application and click on the "Create Deadlock" button to see it in action.

Simple Deadlock Detection

Compatibility :3.5.0
Editions :Professional EditionStandard EditionCommunity Edition (Not Supported)
Platforms :WindowsMac OSXLinuxIBM AIXFreeBSDHP-UXSolarisIBM z/OSIBM z/Linux

The Java Service Wrapper makes it possible to control:

A typical configuration will be something like this:

Example to detect and log a Deadlock:
wrapper.check.deadlock=TRUE
wrapper.check.deadlock.interval=60
wrapper.check.deadlock.action=RESTART
wrapper.check.deadlock.output=FULL

Email Notification

Compatibility :3.5.0
Editions :Professional EditionStandard Edition (Not Supported)Community Edition (Not Supported)
Platforms :WindowsMac OSXLinuxIBM AIXFreeBSDHP-UXSolarisIBM z/OSIBM z/Linux

While the above example will log the cause of a deadlock and then recover the application, it is also useful to receive notification of the problem. The following configuration will log the deadlock, restart the JVM, and then send a notification email:

Example to send Email Notification on Deadlock:
# Check for deadlocks
wrapper.check.deadlock=TRUE
wrapper.check.deadlock.interval=60
wrapper.check.deadlock.action=RESTART
wrapper.check.deadlock.output=FULL

# Send notification email
wrapper.event.default.email.smtp.host=smtp.example.com
wrapper.event.default.email.subject=[%WRAPPER_HOSTNAME%:%WRAPPER_NAME%:%WRAPPER_EVENT_NAME%] Event Notification
wrapper.event.default.email.sender=myapp-noreply@example.com
wrapper.event.default.email.recipient=sysadmins@example.com
wrapper.event.jvm_deadlock.email=TRUE
wrapper.event.jvm_deadlock.email.body=A deadlock was detected in the Messaging Server.\n\nPlease check it.\n
wrapper.event.jvm_deadlock.email.maillog=ATTACHMENT

External Command Execution

Compatibility :3.5.0
Editions :Professional EditionStandard Edition (Not Supported)Community Edition (Not Supported)
Platforms :WindowsMac OSXLinuxIBM AIXFreeBSDHP-UXSolarisIBM z/OSIBM z/Linux

When a deadlock takes place, some data could be left in an unclean state which needs to be cleaned up. While it is usually a good idea to use transactions and otherwise make the Java applications resilient, such functionality would often require changes by the application development team. System Administrators usually have the task of making things work while waiting for a better solution.

The Java Service Wrapper makes it possible to run an external command, application, or batch file in response to events.

The following configurations will:

Example to launch an external command on Deadlock:
# Check for deadlocks
wrapper.check.deadlock=TRUE
wrapper.check.deadlock.interval=60
wrapper.check.deadlock.action=RESTART
wrapper.check.deadlock.output=FULL

# Run external batch file in response to a deadlock.
wrapper.event.jvm_deadlock.command.argv.1=../bin/DeadlockCleanup.bat
wrapper.event.jvm_deadlock.command.block=TRUE

Reference: Deadlock

The Java Service Wrapper provides a full set of configuration properties that allows you to make the Wrapper meet your exact needs. Please take a look at the documentation for the individual properties to see all of the possibilities beyond the examples shown above.