Add support for user-defined healthchecks
This PR adds support for user-defined health-check probes for Docker containers. It adds a `HEALTHCHECK` instruction to the Dockerfile syntax plus some corresponding "docker run" options. It can be used with a restart policy to automatically restart a container if the check fails. The `HEALTHCHECK` instruction has two forms: * `HEALTHCHECK [OPTIONS] CMD command` (check container health by running a command inside the container) * `HEALTHCHECK NONE` (disable any healthcheck inherited from the base image) The `HEALTHCHECK` instruction tells Docker how to test a container to check that it is still working. This can detect cases such as a web server that is stuck in an infinite loop and unable to handle new connections, even though the server process is still running. When a container has a healthcheck specified, it has a _health status_ in addition to its normal status. This status is initially `starting`. Whenever a health check passes, it becomes `healthy` (whatever state it was previously in). After a certain number of consecutive failures, it becomes `unhealthy`. The options that can appear before `CMD` are: * `--interval=DURATION` (default: `30s`) * `--timeout=DURATION` (default: `30s`) * `--retries=N` (default: `1`) The health check will first run **interval** seconds after the container is started, and then again **interval** seconds after each previous check completes. If a single run of the check takes longer than **timeout** seconds then the check is considered to have failed. It takes **retries** consecutive failures of the health check for the container to be considered `unhealthy`. There can only be one `HEALTHCHECK` instruction in a Dockerfile. If you list more than one then only the last `HEALTHCHECK` will take effect. The command after the `CMD` keyword can be either a shell command (e.g. `HEALTHCHECK CMD /bin/check-running`) or an _exec_ array (as with other Dockerfile commands; see e.g. `ENTRYPOINT` for details). The command's exit status indicates the health status of the container. The possible values are: - 0: success - the container is healthy and ready for use - 1: unhealthy - the container is not working correctly - 2: starting - the container is not ready for use yet, but is working correctly If the probe returns 2 ("starting") when the container has already moved out of the "starting" state then it is treated as "unhealthy" instead. For example, to check every five minutes or so that a web-server is able to serve the site's main page within three seconds: HEALTHCHECK --interval=5m --timeout=3s \ CMD curl -f http://localhost/ || exit 1 To help debug failing probes, any output text (UTF-8 encoded) that the command writes on stdout or stderr will be stored in the health status and can be queried with `docker inspect`. Such output should be kept short (only the first 4096 bytes are stored currently). When the health status of a container changes, a `health_status` event is generated with the new status. The health status is also displayed in the `docker ps` output. Signed-off-by: Thomas Leonard <thomas.leonard@docker.com> Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This commit is contained in:
parent
cceb74311b
commit
51ddea93a2
@ -1470,6 +1470,73 @@ The `STOPSIGNAL` instruction sets the system call signal that will be sent to th
|
|||||||
This signal can be a valid unsigned number that matches a position in the kernel's syscall table, for instance 9,
|
This signal can be a valid unsigned number that matches a position in the kernel's syscall table, for instance 9,
|
||||||
or a signal name in the format SIGNAME, for instance SIGKILL.
|
or a signal name in the format SIGNAME, for instance SIGKILL.
|
||||||
|
|
||||||
|
## HEALTHCHECK
|
||||||
|
|
||||||
|
The `HEALTHCHECK` instruction has two forms:
|
||||||
|
|
||||||
|
* `HEALTHCHECK [OPTIONS] CMD command` (check container health by running a command inside the container)
|
||||||
|
* `HEALTHCHECK NONE` (disable any healthcheck inherited from the base image)
|
||||||
|
|
||||||
|
The `HEALTHCHECK` instruction tells Docker how to test a container to check that
|
||||||
|
it is still working. This can detect cases such as a web server that is stuck in
|
||||||
|
an infinite loop and unable to handle new connections, even though the server
|
||||||
|
process is still running.
|
||||||
|
|
||||||
|
When a container has a healthcheck specified, it has a _health status_ in
|
||||||
|
addition to its normal status. This status is initially `starting`. Whenever a
|
||||||
|
health check passes, it becomes `healthy` (whatever state it was previously in).
|
||||||
|
After a certain number of consecutive failures, it becomes `unhealthy`.
|
||||||
|
|
||||||
|
The options that can appear before `CMD` are:
|
||||||
|
|
||||||
|
* `--interval=DURATION` (default: `30s`)
|
||||||
|
* `--timeout=DURATION` (default: `30s`)
|
||||||
|
* `--retries=N` (default: `1`)
|
||||||
|
|
||||||
|
The health check will first run **interval** seconds after the container is
|
||||||
|
started, and then again **interval** seconds after each previous check completes.
|
||||||
|
|
||||||
|
If a single run of the check takes longer than **timeout** seconds then the check
|
||||||
|
is considered to have failed.
|
||||||
|
|
||||||
|
It takes **retries** consecutive failures of the health check for the container
|
||||||
|
to be considered `unhealthy`.
|
||||||
|
|
||||||
|
There can only be one `HEALTHCHECK` instruction in a Dockerfile. If you list
|
||||||
|
more than one then only the last `HEALTHCHECK` will take effect.
|
||||||
|
|
||||||
|
The command after the `CMD` keyword can be either a shell command (e.g. `HEALTHCHECK
|
||||||
|
CMD /bin/check-running`) or an _exec_ array (as with other Dockerfile commands;
|
||||||
|
see e.g. `ENTRYPOINT` for details).
|
||||||
|
|
||||||
|
The command's exit status indicates the health status of the container.
|
||||||
|
The possible values are:
|
||||||
|
|
||||||
|
- 0: success - the container is healthy and ready for use
|
||||||
|
- 1: unhealthy - the container is not working correctly
|
||||||
|
- 2: starting - the container is not ready for use yet, but is working correctly
|
||||||
|
|
||||||
|
If the probe returns 2 ("starting") when the container has already moved out of the
|
||||||
|
"starting" state then it is treated as "unhealthy" instead.
|
||||||
|
|
||||||
|
For example, to check every five minutes or so that a web-server is able to
|
||||||
|
serve the site's main page within three seconds:
|
||||||
|
|
||||||
|
HEALTHCHECK --interval=5m --timeout=3s \
|
||||||
|
CMD curl -f http://localhost/ || exit 1
|
||||||
|
|
||||||
|
To help debug failing probes, any output text (UTF-8 encoded) that the command writes
|
||||||
|
on stdout or stderr will be stored in the health status and can be queried with
|
||||||
|
`docker inspect`. Such output should be kept short (only the first 4096 bytes
|
||||||
|
are stored currently).
|
||||||
|
|
||||||
|
When the health status of a container changes, a `health_status` event is
|
||||||
|
generated with the new status.
|
||||||
|
|
||||||
|
The `HEALTHCHECK` feature was added in Docker 1.12.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
## Dockerfile examples
|
## Dockerfile examples
|
||||||
|
|
||||||
Below you can see some examples of Dockerfile syntax. If you're interested in
|
Below you can see some examples of Dockerfile syntax. If you're interested in
|
||||||
|
@ -1250,6 +1250,7 @@ Dockerfile instruction and how the operator can override that setting.
|
|||||||
#entrypoint-default-command-to-execute-at-runtime)
|
#entrypoint-default-command-to-execute-at-runtime)
|
||||||
- [EXPOSE (Incoming Ports)](#expose-incoming-ports)
|
- [EXPOSE (Incoming Ports)](#expose-incoming-ports)
|
||||||
- [ENV (Environment Variables)](#env-environment-variables)
|
- [ENV (Environment Variables)](#env-environment-variables)
|
||||||
|
- [HEALTHCHECK](#healthcheck)
|
||||||
- [VOLUME (Shared Filesystems)](#volume-shared-filesystems)
|
- [VOLUME (Shared Filesystems)](#volume-shared-filesystems)
|
||||||
- [USER](#user)
|
- [USER](#user)
|
||||||
- [WORKDIR](#workdir)
|
- [WORKDIR](#workdir)
|
||||||
@ -1398,6 +1399,65 @@ above, or already defined by the developer with a Dockerfile `ENV`:
|
|||||||
|
|
||||||
Similarly the operator can set the **hostname** with `-h`.
|
Similarly the operator can set the **hostname** with `-h`.
|
||||||
|
|
||||||
|
### HEALTHCHECK
|
||||||
|
|
||||||
|
```
|
||||||
|
--health-cmd Command to run to check health
|
||||||
|
--health-interval Time between running the check
|
||||||
|
--health-retries Consecutive failures needed to report unhealthy
|
||||||
|
--health-timeout Maximum time to allow one check to run
|
||||||
|
--no-healthcheck Disable any container-specified HEALTHCHECK
|
||||||
|
```
|
||||||
|
|
||||||
|
Example:
|
||||||
|
|
||||||
|
$ docker run --name=test -d \
|
||||||
|
--health-cmd='stat /etc/passwd || exit 1' \
|
||||||
|
--health-interval=2s \
|
||||||
|
busybox sleep 1d
|
||||||
|
$ sleep 2; docker inspect --format='{{.State.Health.Status}}' test
|
||||||
|
healthy
|
||||||
|
$ docker exec test rm /etc/passwd
|
||||||
|
$ sleep 2; docker inspect --format='{{json .State.Health}}' test
|
||||||
|
{
|
||||||
|
"Status": "unhealthy",
|
||||||
|
"FailingStreak": 3,
|
||||||
|
"Log": [
|
||||||
|
{
|
||||||
|
"Start": "2016-05-25T17:22:04.635478668Z",
|
||||||
|
"End": "2016-05-25T17:22:04.7272552Z",
|
||||||
|
"ExitCode": 0,
|
||||||
|
"Output": " File: /etc/passwd\n Size: 334 \tBlocks: 8 IO Block: 4096 regular file\nDevice: 32h/50d\tInode: 12 Links: 1\nAccess: (0664/-rw-rw-r--) Uid: ( 0/ root) Gid: ( 0/ root)\nAccess: 2015-12-05 22:05:32.000000000\nModify: 2015..."
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"Start": "2016-05-25T17:22:06.732900633Z",
|
||||||
|
"End": "2016-05-25T17:22:06.822168935Z",
|
||||||
|
"ExitCode": 0,
|
||||||
|
"Output": " File: /etc/passwd\n Size: 334 \tBlocks: 8 IO Block: 4096 regular file\nDevice: 32h/50d\tInode: 12 Links: 1\nAccess: (0664/-rw-rw-r--) Uid: ( 0/ root) Gid: ( 0/ root)\nAccess: 2015-12-05 22:05:32.000000000\nModify: 2015..."
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"Start": "2016-05-25T17:22:08.823956535Z",
|
||||||
|
"End": "2016-05-25T17:22:08.897359124Z",
|
||||||
|
"ExitCode": 1,
|
||||||
|
"Output": "stat: can't stat '/etc/passwd': No such file or directory\n"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"Start": "2016-05-25T17:22:10.898802931Z",
|
||||||
|
"End": "2016-05-25T17:22:10.969631866Z",
|
||||||
|
"ExitCode": 1,
|
||||||
|
"Output": "stat: can't stat '/etc/passwd': No such file or directory\n"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"Start": "2016-05-25T17:22:12.971033523Z",
|
||||||
|
"End": "2016-05-25T17:22:13.082015516Z",
|
||||||
|
"ExitCode": 1,
|
||||||
|
"Output": "stat: can't stat '/etc/passwd': No such file or directory\n"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
|
||||||
|
The health status is also displayed in the `docker ps` output.
|
||||||
|
|
||||||
### TMPFS (mount tmpfs filesystems)
|
### TMPFS (mount tmpfs filesystems)
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
|
Loading…
x
Reference in New Issue
Block a user