Windows Service Recovery Properties Overview

The Java Service Wrapper does a very good job of monitoring and recovering from errors in the Java process. While the Wrapper itself is very stable, it is good to avoid any point of failure. The Wrapper makes it possible to make use of the Windows Service Manager's ability to automatically restart a service which has crashed. The Wrapper returning a non-zero exit code is not considered as a failure.

The wrapper.ntservice.recovery.<n> properties make it possible to configure these recovery features from within the Wrapper's configuration. An example configuration can be found below.

NOTE

These properties are used when installing the Wrapper as a Windows service. Changes on any of these properties will not take effect until the Windows Service is reinstalled, but you can edit the configuration of an installed service from the Recovery Tab on the Properties dialog of the Services Control Panel.

wrapper.ntservice.recovery.reset

Compatibility :3.5.5
Editions :Professional EditionStandard EditionCommunity Edition (Not Supported)
Platforms :WindowsMac OSX (Not Supported)Linux (Not Supported)IBM AIX (Not Supported)FreeBSD (Not Supported)HP-UX (Not Supported)Solaris (Not Supported)IBM z/Linux (Not Supported)

The Windows Service Control Manager maintains a failure count for each service installed. Every time the service terminates unexpectedly, the counter will be incremented and the registered failure action for that count will be triggered.

This property is used to configure the time in seconds, without any errors, after which the Service Manager will reset this counter back to "0" (zero). The default value for this property is 3600 (= 1 hour). If "-1" is set, this means that the Service Control Manager will never reset the counter.

Example, setting the reset time to 86400 seconds (=1 day):
wrapper.ntservice.recovery.reset=86400

NOTE

To avoid confusion, please note that the Recovery Tab on the Properties dialog of the Services Control Panel displays the reset time in days rather than seconds. This means that the displayed values will be floored to days. For instance a value of 3600 seconds (1 hour) will be displayed as "0" (zero) days.

wrapper.ntservice.recovery.command

Compatibility :3.5.5
Editions :Professional EditionStandard EditionCommunity Edition (Not Supported)
Platforms :WindowsMac OSX (Not Supported)Linux (Not Supported)IBM AIX (Not Supported)FreeBSD (Not Supported)HP-UX (Not Supported)Solaris (Not Supported)IBM z/Linux (Not Supported)

One thing that the Service Control Manager lets you do is execute an external command in response to a failure. This can be used to do things like perform cleanup or send external notification.

While it is possible to specify a different wrapper.ntservice.recovery.<n>.failure property for each recovery event, the Service Control Manager API only makes it possible to specify a single command, which can then be referenced as the failure action for one or more recovery events.

This property does not have a default value and must be specified if one or more "failure actions" (wrapper.ntservice.recovery.<n>.failure property) is set to COMMAND.

Example:
wrapper.ntservice.recovery.command=C:\cleanup.bat

wrapper.ntservice.recovery.reboot_msg

Compatibility :3.5.5
Editions :Professional EditionStandard EditionCommunity Edition (Not Supported)
Platforms :WindowsMac OSX (Not Supported)Linux (Not Supported)IBM AIX (Not Supported)FreeBSD (Not Supported)HP-UX (Not Supported)Solaris (Not Supported)IBM z/Linux (Not Supported)

If REBOOT is specified for a wrapper.ntservice.recovery.<n>.failure property, the Service Control Manager will log a System Event describing "why the system was rebooted". This property is used to define the text of that message.

The default message is localized in the language being used by the Wrapper. For English, the default message is: "System will be rebooted now due to an error in the wrapped service."

Example:
wrapper.ntservice.recovery.reboot_msg=System will reboot now.

wrapper.ntservice.recovery.<n>.failure

Compatibility :3.5.5
Editions :Professional EditionStandard EditionCommunity Edition (Not Supported)
Platforms :WindowsMac OSX (Not Supported)Linux (Not Supported)IBM AIX (Not Supported)FreeBSD (Not Supported)HP-UX (Not Supported)Solaris (Not Supported)IBM z/Linux (Not Supported)

This property sets the failure action which should happen if the Wrapper service crashes. The "<n>" component of the property name indicates the number of the failure count.

The Service Control Manager counts the number of times each service has failed since the system booted. The count is reset back to "0" (zero) if the service has not failed for at least the number of seconds specified on property wrapper.ntservice.recovery.reset. When the service fails, the failure count (N) is used to decide which failure action to take. These actions are defined with the wrapper.ntservice.recovery.<n>.failure property, where 'N' is the failure count. If N is greater than the last configured action, then the failure action with the highest N (<n> index) will be used.

Possible actions are:

  • NONE :

    The Service Manager will do nothing. This is the same as if no recovery properties were specified. The service will stop and stay stopped. Any failure actions defined with higher N values will never be reached.

  • RESTART :

    The Service Manager will restart the service. If you want to perform a restart of the Java application on certain exit codes that the JVM returns, you may also use the wrapper.on_exit.<n> property.

  • COMMAND :

    The Service Manager will run the command specified with the wrapper.ntservice.recovery.command property. If you wish the service to be restarted after running the command, then this must be done within the command.

  • REBOOT :

    The Service Manager will reboot the machine. If the service's "start" mode is set to AUTO_START, then the service will be started again after the reboot. See the wrapper.ntservice.starttype property for more information.

Example:
wrapper.ntservice.recovery.1.failure=RESTART
wrapper.ntservice.recovery.2.failure=COMMAND
wrapper.ntservice.recovery.3.failure=REBOOT

While the API makes it possible to specify a failure action after NONE or REBOOT, they will never be used.

NOTE

The Recovery Tab on the Properties dialog of the Services Control Panel ([Control Panel] - [Administrator Tool] - [Service]) will only display the first 3 configured failure actions. The API however makes it possible to define more. Be aware that using more than 3 failure actions could be confusing to a user looking at the dialog.

wrapper.ntservice.recovery.<n>.delay

Compatibility :3.5.5
Editions :Professional EditionStandard EditionCommunity Edition (Not Supported)
Platforms :WindowsMac OSX (Not Supported)Linux (Not Supported)IBM AIX (Not Supported)FreeBSD (Not Supported)HP-UX (Not Supported)Solaris (Not Supported)IBM z/Linux (Not Supported)

This property sets the time in seconds the Service Control Manager will wait until it actually performs the according action specified in wrapper.ntservice.recovery.<n>.failure.

The value of this property cannot be smaller than "0" (zero). The default value is 3 times the value of wrapper.ping.timeout property.

While in theory it would be possible to set a delay for the failure action "NONE", the Service Control Manager would ignore the delay in this case.

Example, setting the delay to 90 seconds (1.5min):
wrapper.ntservice.recovery.1.delay=90

NOTE

In order to prevent confusion when checking the Service on the Service Dialog in the Recovery Tab, please note that the delay time is displayed in minutes. This means that displayed values will be floored to minutes. For instance a value of 90 seconds (1.5 min) will be displayed as 1 minute.

Putting it all together

The following example will

  • set the reset time to 86400 seconds (1 day)
  • 1st Failure: tell the Service Control Manager to wait 90 seconds (1.5 min) before performing failure action 1
  • 1st Failure: specify failure action 1 to RESTART, what will restart the service
  • 2nd Failure: tell the Service Control Manager to wait 180 seconds (3 min) before performing failure action 2
  • 2nd Failure: specify failure action 2 to COMMAND, what will run an external command
  • 2nd Failure: set the external command to "C:\cleanup.bat"
  • 3rd Failure: tell the Service Control Manager to wait 600 seconds (10 min) before performing failure action 3
  • 3rd Failure: specify failure action 3 to REBOOT, what will reboot the machine
  • 3rd Failure: set the message, which gets logged with the reboot in the System Events to "System will reboot now."
Example:
wrapper.ntservice.recovery.reset=86400

wrapper.ntservice.recovery.1.delay=90
wrapper.ntservice.recovery.1.failure=RESTART

wrapper.ntservice.recovery.2.delay=180
wrapper.ntservice.recovery.2.failure=COMMAND
wrapper.ntservice.recovery.command=C:\cleanup.bat

wrapper.ntservice.recovery.3.delay=600
wrapper.ntservice.recovery.3.failure=REBOOT
wrapper.ntservice.recovery.reboot_msg=System will reboot now.

Checking the result on the Recovery Tab of the Windows Service:

Startup-Restart: Delay