Add support for user-defined healthchecks

This PR adds support for user-defined health-check probes for Docker containers. It adds a `HEALTHCHECK` instruction to the Dockerfile syntax plus some corresponding "docker run" options. It can be used with a restart policy to automatically restart a container if the check fails. The `HEALTHCHECK` instruction has two forms: * `HEALTHCHECK [OPTIONS] CMD command` (check container health by running a command inside the container) * `HEALTHCHECK NONE` (disable any healthcheck inherited from the base image) The `HEALTHCHECK` instruction tells Docker how to test a container to check that it is still working. This can detect cases such as a web server that is stuck in an infinite loop and unable to handle new connections, even though the server process is still running. When a container has a healthcheck specified, it has a _health status_ in addition to its normal status. This status is initially `starting`. Whenever a health check passes, it becomes `healthy` (whatever state it was previously in). After a certain number of consecutive failures, it becomes `unhealthy`. The options that can appear before `CMD` are: * `--interval=DURATION` (default: `30s`) * `--timeout=DURATION` (default: `30s`) * `--retries=N` (default: `1`) The health check will first run **interval** seconds after the container is started, and then again **interval** seconds after each previous check completes. If a single run of the check takes longer than **timeout** seconds then the check is considered to have failed. It takes **retries** consecutive failures of the health check for the container to be considered `unhealthy`. There can only be one `HEALTHCHECK` instruction in a Dockerfile. If you list more than one then only the last `HEALTHCHECK` will take effect. The command after the `CMD` keyword can be either a shell command (e.g. `HEALTHCHECK CMD /bin/check-running`) or an _exec_ array (as with other Dockerfile commands; see e.g. `ENTRYPOINT` for details). The command's exit status indicates the health status of the container. The possible values are: - 0: success - the container is healthy and ready for use - 1: unhealthy - the container is not working correctly - 2: starting - the container is not ready for use yet, but is working correctly If the probe returns 2 ("starting") when the container has already moved out of the "starting" state then it is treated as "unhealthy" instead. For example, to check every five minutes or so that a web-server is able to serve the site's main page within three seconds: HEALTHCHECK --interval=5m --timeout=3s \ CMD curl -f http://localhost/ || exit 1 To help debug failing probes, any output text (UTF-8 encoded) that the command writes on stdout or stderr will be stored in the health status and can be queried with `docker inspect`. Such output should be kept short (only the first 4096 bytes are stored currently). When the health status of a container changes, a `health_status` event is generated with the new status. The health status is also displayed in the `docker ps` output. Signed-off-by: Thomas Leonard <thomas.leonard@docker.com> Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2016-04-18 10:48:13 +01:00 · 2016-04-18 10:48:13 +01:00 · 51ddea93a2
commit 51ddea93a2
parent cceb74311b
2 changed files with 127 additions and 0 deletions
--- a/docs/reference/builder.md
+++ b/docs/reference/builder.md
@ -1470,6 +1470,73 @@ The `STOPSIGNAL` instruction sets the system call signal that will be sent to th
 This signal can be a valid unsigned number that matches a position in the kernel's syscall table, for instance 9,
 or a signal name in the format SIGNAME, for instance SIGKILL.
 ## HEALTHCHECK
 The `HEALTHCHECK` instruction has two forms:
 * `HEALTHCHECK [OPTIONS] CMD command` (check container health by running a command inside the container)
 * `HEALTHCHECK NONE` (disable any healthcheck inherited from the base image)
 The `HEALTHCHECK` instruction tells Docker how to test a container to check that
 it is still working. This can detect cases such as a web server that is stuck in
 an infinite loop and unable to handle new connections, even though the server
 process is still running.
 When a container has a healthcheck specified, it has a _health status_ in
 addition to its normal status. This status is initially `starting`. Whenever a
 health check passes, it becomes `healthy` (whatever state it was previously in).
 After a certain number of consecutive failures, it becomes `unhealthy`.
 The options that can appear before `CMD` are:
 * `--interval=DURATION` (default: `30s`)
 * `--timeout=DURATION` (default: `30s`)
 * `--retries=N` (default: `1`)
 The health check will first run **interval** seconds after the container is
 started, and then again **interval** seconds after each previous check completes.
 If a single run of the check takes longer than **timeout** seconds then the check
 is considered to have failed.
 It takes **retries** consecutive failures of the health check for the container
 to be considered `unhealthy`.
 There can only be one `HEALTHCHECK` instruction in a Dockerfile. If you list
 more than one then only the last `HEALTHCHECK` will take effect.
 The command after the `CMD` keyword can be either a shell command (e.g. `HEALTHCHECK
 CMD /bin/check-running`) or an _exec_ array (as with other Dockerfile commands;
 see e.g. `ENTRYPOINT` for details).
 The command's exit status indicates the health status of the container.
 The possible values are:
 - 0: success - the container is healthy and ready for use
 - 1: unhealthy - the container is not working correctly
 - 2: starting - the container is not ready for use yet, but is working correctly
 If the probe returns 2 ("starting") when the container has already moved out of the
 "starting" state then it is treated as "unhealthy" instead.
 For example, to check every five minutes or so that a web-server is able to
 serve the site's main page within three seconds:
    HEALTHCHECK --interval=5m --timeout=3s \
      CMD curl -f http://localhost/ || exit 1
 To help debug failing probes, any output text (UTF-8 encoded) that the command writes
 on stdout or stderr will be stored in the health status and can be queried with
 `docker inspect`. Such output should be kept short (only the first 4096 bytes
 are stored currently).
 When the health status of a container changes, a `health_status` event is
 generated with the new status.
 The `HEALTHCHECK` feature was added in Docker 1.12.
 ## Dockerfile examples
 Below you can see some examples of Dockerfile syntax. If you're interested in
--- a/docs/reference/run.md
+++ b/docs/reference/run.md
@ -1250,6 +1250,7 @@ Dockerfile instruction and how the operator can override that setting.
    #entrypoint-default-command-to-execute-at-runtime)
 - [EXPOSE (Incoming Ports)](#expose-incoming-ports)
 - [ENV (Environment Variables)](#env-environment-variables)
 - [HEALTHCHECK](#healthcheck)
 - [VOLUME (Shared Filesystems)](#volume-shared-filesystems)
 - [USER](#user)
 - [WORKDIR](#workdir)
@ -1398,6 +1399,65 @@ above, or already defined by the developer with a Dockerfile `ENV`:
 Similarly the operator can set the **hostname** with `-h`.
 ### HEALTHCHECK
 ```
  --health-cmd            Command to run to check health
  --health-interval       Time between running the check
  --health-retries        Consecutive failures needed to report unhealthy
  --health-timeout        Maximum time to allow one check to run
  --no-healthcheck        Disable any container-specified HEALTHCHECK
 ```
 Example:
    $ docker run --name=test -d \
        --health-cmd='stat /etc/passwd || exit 1' \
        --health-interval=2s \
        busybox sleep 1d
    $ sleep 2; docker inspect --format='{{.State.Health.Status}}' test
    healthy
    $ docker exec test rm /etc/passwd
    $ sleep 2; docker inspect --format='{{json .State.Health}}' test
    {
      "Status": "unhealthy",
      "FailingStreak": 3,
      "Log": [
        {
          "Start": "2016-05-25T17:22:04.635478668Z",
          "End": "2016-05-25T17:22:04.7272552Z",
          "ExitCode": 0,
          "Output": "  File: /etc/passwd\n  Size: 334       \tBlocks: 8          IO Block: 4096   regular file\nDevice: 32h/50d\tInode: 12          Links: 1\nAccess: (0664/-rw-rw-r--)  Uid: (    0/    root)   Gid: (    0/    root)\nAccess: 2015-12-05 22:05:32.000000000\nModify: 2015..."
        },
        {
          "Start": "2016-05-25T17:22:06.732900633Z",
          "End": "2016-05-25T17:22:06.822168935Z",
          "ExitCode": 0,
          "Output": "  File: /etc/passwd\n  Size: 334       \tBlocks: 8          IO Block: 4096   regular file\nDevice: 32h/50d\tInode: 12          Links: 1\nAccess: (0664/-rw-rw-r--)  Uid: (    0/    root)   Gid: (    0/    root)\nAccess: 2015-12-05 22:05:32.000000000\nModify: 2015..."
        },
        {
          "Start": "2016-05-25T17:22:08.823956535Z",
          "End": "2016-05-25T17:22:08.897359124Z",
          "ExitCode": 1,
          "Output": "stat: can't stat '/etc/passwd': No such file or directory\n"
        },
        {
          "Start": "2016-05-25T17:22:10.898802931Z",
          "End": "2016-05-25T17:22:10.969631866Z",
          "ExitCode": 1,
          "Output": "stat: can't stat '/etc/passwd': No such file or directory\n"
        },
        {
          "Start": "2016-05-25T17:22:12.971033523Z",
          "End": "2016-05-25T17:22:13.082015516Z",
          "ExitCode": 1,
          "Output": "stat: can't stat '/etc/passwd': No such file or directory\n"
        }
      ]
    }
 The health status is also displayed in the `docker ps` output.
 ### TMPFS (mount tmpfs filesystems)
 ```bash