What the ErrImagePull Error Means
The ErrImagePull error (and its derivative ImagePullBackOff) occurs when the kubelet on a cluster node fails to download the specified container image from a registry. In the logs, you will see a message similar to: Failed to pull image "registry.example.com/app:v1": rpc error: code = Unknown desc = failed to pull and unpack image.
This issue blocks pod startup during the initialization phase (Pending → ImagePullBackOff). The service remains unavailable, and the orchestrator continues to retry the operation indefinitely, gradually increasing the interval between attempts.
Root Causes
- A typo in the image name or a non-existent tag (the registry returns
404 Not Found). - Missing credentials for a private registry (
403 Forbiddenor401 Unauthorized). - Outbound connections blocked by a firewall, missing routing, or misconfigured HTTP/HTTPS proxies on the nodes.
- Local DNS resolver failure, preventing the registry domain from resolving to an IP address.
- Insufficient free space on the
/var/lib/containerdor/var/lib/dockerpartition. Caching is interrupted, and the download is aborted.
Resolution Steps
Method 1: Validate the Image Name and Tag
A common cause is human error when editing manifests. Ensure that the repository path, image name, and tag exactly match the actually published version.
- Locate the problematic pod:
kubectl get pods -n <namespace> | grep ErrImagePull - Check the current configuration:
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].image}' - Fix the deployment:
kubectl set image deployment/<name> <container-name>=registry.example.com/app:v1.2.3
💡 Tip: Always use specific tags or SHA digests. The
latesttag complicates debugging and can lead to pulling unstable builds.
Method 2: Diagnose Network Restrictions
If the image is public but fails to pull, the issue is almost certainly with the node's network. The kubelet uses the node's system network stack, not the pod's network.
- SSH into the node:
ssh user@k8s-node-01 - Verify DNS resolution:
nslookup registry.example.com - Attempt to pull the image directly:
crictl pull registry.example.com/app:v1.2.3 - If the command fails, check proxy settings:
systemctl show docker | grep -i proxyorenv | grep -i proxy
For corporate environments, add the registry domains to the noProxy configuration in the systemd unit files for containerd.service or docker.service.
Method 3: Configure imagePullSecrets for Private Registries
Without a valid secret, the kubelet cannot authenticate with Docker Hub, GitLab Registry, or AWS ECR. The secret must be created in the same namespace where the application is deployed.
kubectl create secret docker-registry regcred \
--docker-server=https://registry.example.com \
--docker-username=deploy-bot \
--docker-password=$CI_TOKEN \
--namespace=production
Attach the secret to the deployment:
spec:
template:
spec:
containers:
- name: app
image: registry.example.com/app:v1.2.3
imagePullSecrets:
- name: regcred
Verify the status: kubectl get secret regcred -o yaml and ensure the token has not expired.
Method 4: Clear Corrupted Runtime Cache
Sometimes, a partial download leaves behind "corrupted" layers that block subsequent attempts. The runtime fails to recognize them and continuously returns an error.
# Scale down the problematic pod so kubelet stops managing it
kubectl scale deployment <name> --replicas=0
# Clear the local cache on the node
sudo crictl rmi registry.example.com/app:v1.2.3
# Restart the runtime
sudo systemctl restart containerd
Once the runtime is back up, scale the deployment back to the desired number of replicas. The pull will start from scratch.
Prevention
- Time synchronization: Configure
chronyorsystemd-timesyncdon all nodes. A system clock skew of more than 5 minutes breaks TLS handshakes with registries. - Disk space monitoring: Deploy
node-exporterand set up alerts forNodeDiskPressure. A full disk is the most silent cause of pull failures. - Token rotation: For CI/CD, use short-lived credentials and automate
imagePullSecretsupdates via External Secrets Operator or HashiCorp Vault. - Local mirrors: For large clusters, configure a
registry-mirrorin thecontainerdconfiguration. This reduces load on public registries and eliminates network timeouts.