Docker daemon start/stop issue
Symptom : A container cannot be stopped or removed AND trying to stop/start the docker daemon doesn’t help : the daemon looks hanging.
Cause :
The container mounted a directory that it didn’t release even after the container was stopped/removed.
Solution to fix : identify the directory/file mounted and not free by the container and unmount it.
– To see mounts containing docker word : mount | grep docker
– To umount a file/directory : umount fileOrDir
Solution to prevent :
Ensure that these containers a volume mounting issues to be in a quite stable state before to stop and remove them.
Symptom : The docker daemon cannot be stopped or started: the daemon looks hanging.
At dockerd stop we could see something like that :
dockerd[26985]: time="2020-01-07T10:19:54.506363297+02:00" level=info msg="Processing signal 'terminated'" systemd[1]: docker.service stop-sigterm timed out. Killing. systemd[1]: docker.service: main process exited, code=killed, status=9/KILL systemd[1]: Stopped Docker Application Container Engine. systemd[1]: Unit docker.service entered failed state. |
At dockerd start we could see several kind of messages (about volume, container..).
But they have generally the same cause :
error : no such container FOO_CONTAINER |
Possible cause :
A stale state in the current running docker-containerd-shim processes.
Solution to fix :
– stop the dockerd (at least try)
– identify all docker-containerd-shim processes running on the host.
– kill them
There may contain a dozen or more of containerd processes. Using awk may help to batch their killing.
1) output the pids separate by a blank :
ps aux | grep [d]ocker | awk '{pids=pids " " $2} END {print pids}'
2) kill them (copy paste them or store them into a var):
kill -9 ….
Building image issues
The image build fails at a step and we want to inspect the image state/content before the error
In the DockerFile :
1) Comment all instructions in the DockerFile since the instruction that fails
2) Comment as well as the existing ENTRYPOINT/CMD
3) Add an entryPoint that run the shell : ENTRYPOINT ["sh"]
3) Add an entryPoint that loops forever as last uncommented instruction :
ENTRYPOINT ["tail", "-f", "/dev/null"]
In the host shell :
1) Build the image (supposing that the build context is the current directory) :
docker build [-f DOCKERFILE_LOCATION_IF_NEEDED] -t TAG:VERSION .
2) Run the container and execute the shell as command :
docker run -ti IMAGE
The image build fails at a step and I want to have a sandbox to make fast multiple tries to understand the issue
Follow the instructions in the previous point.
Once connected to the container, experiment the instruction that fails by entering and try with any variant to understand the issue.
Overwrite the entry point to have container running in background with a dummy process
To enter in the container when running and remove it when you exit from it:
docker run -ti --entrypoint "bash" registry.gitlab.com/gitlab-org/cloud-deploy/aws-base:latest
Or to keep the container running:
docker run -tid --name debug --entrypoint "bash" registry.gitlab.com/gitlab-org/cloud-deploy/aws-base:latest -c tail -f /dev/null
docker exec -ti debug bash
Running container issues
The contained exits prematurely at startup and we don’t have enough information to understand the reason
1) In the host shell :
– docker way :
Run the container and pass the shell command as entrypoint to prevent the failure :
docker run --rm -ti --entrypoint sh IMAGE
– docker-compose way :
docker-compose run --service-ports --entrypoint sh IMAGE
Note that by default run uses an interactive mode.
2) In the container shell :
Reexecute the command defined in the ENTRYPOINT/CMD that has exited with an error. Now the container stays running, you can analyse the state and do any experimentations.
The container fail fast to run
1 Case) The iptables are stale because of recent changes.
The error message look like :
docker: Error response from daemon: driver failed programming external connectivity on endpoint prom (6ef6b57285842e5dc9aa30ee06c3e9cdf0ae444e9027762f9c9c4982c388b85f): (iptables failed: iptables --wait -t filter -A DOCKER ! -i docker0 -o docker0 -p tcp -d 172.17.0.6 --dport 9090 -j ACCEPT: iptables: No chain/target/match by that name. (exit status 1)). |
Solution)
systemctl stop docker
systemctl start docker
The container fails during its starting with a permission denied error to open/write a file
Multiple causes :
1) The owner/permissions of docker directories (/var/lib/docker…) were changed manually.
In that case, it may be complex to identify the exact issue.
Solution (by order) :
a) set the folders/files of docker with the correct owner :
chown -R root:root /var/lib/docker
b) If not working, docker-ce reinstall may solve be tried.
Docker : container and image data with Overlay 2 driver
Layout
Image and container data are stored into : /var/lib/docker/overlay2
Inside it, each directory is either a layer or a container (not sure…).
Container content is in the merged folder.
Identify unused data in containers/images
Sometimes, clearing unused containers/images with commands such as docker system prune
is not enough to clear all unused layer/container data.
If we lack of space, we could identify layer/container directories that consume a lot of space and ensure that these are really used.
To identify big folders :
du -sh /var/lib/docker/overlay2/* | sort -k1h
To identify merged folders of existing containers :
docker inspect -f $'{{.Name}}\t{{.GraphDriver.Data.MergedDir}}' $(docker ps -aq)
To ensure that merged folders are used by existing containers :
Not simple because some layers are used but not found in the docker inspect command.
So it should be done very cautiously.
Corrupted images or layers during docker build
In some rare but possible circumstances, image and layer may be corrupted.
Sometimes, adding –pull and –rm flags in the docker build command fixes the issue.
Other times, no.
In that case, to fix the problem, a possible trick is specifying another version for the base image of the Dockerfile.
Often a minor increment version is enough to make docker build to require to download completely new layer.