Fixing ErrImagePull in Kubernetes: Proven Solutions

What the ErrImagePull Error Means

The ErrImagePull error (and its derivative ImagePullBackOff) occurs when the kubelet on a cluster node fails to download the specified container image from a registry. In the logs, you will see a message similar to: Failed to pull image "registry.example.com/app:v1": rpc error: code = Unknown desc = failed to pull and unpack image.

This issue blocks pod startup during the initialization phase (Pending → ImagePullBackOff). The service remains unavailable, and the orchestrator continues to retry the operation indefinitely, gradually increasing the interval between attempts.

Root Causes

A typo in the image name or a non-existent tag (the registry returns 404 Not Found).
Missing credentials for a private registry (403 Forbidden or 401 Unauthorized).
Outbound connections blocked by a firewall, missing routing, or misconfigured HTTP/HTTPS proxies on the nodes.
Local DNS resolver failure, preventing the registry domain from resolving to an IP address.
Insufficient free space on the /var/lib/containerd or /var/lib/docker partition. Caching is interrupted, and the download is aborted.

Resolution Steps

Method 1: Validate the Image Name and Tag

A common cause is human error when editing manifests. Ensure that the repository path, image name, and tag exactly match the actually published version.

Locate the problematic pod: kubectl get pods -n <namespace> | grep ErrImagePull
Check the current configuration: kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].image}'
Fix the deployment: kubectl set image deployment/<name> <container-name>=registry.example.com/app:v1.2.3

💡 Tip: Always use specific tags or SHA digests. The latest tag complicates debugging and can lead to pulling unstable builds.

Method 2: Diagnose Network Restrictions

If the image is public but fails to pull, the issue is almost certainly with the node's network. The kubelet uses the node's system network stack, not the pod's network.

SSH into the node: ssh user@k8s-node-01
Verify DNS resolution: nslookup registry.example.com
Attempt to pull the image directly: crictl pull registry.example.com/app:v1.2.3
If the command fails, check proxy settings: systemctl show docker | grep -i proxy or env | grep -i proxy

For corporate environments, add the registry domains to the noProxy configuration in the systemd unit files for containerd.service or docker.service.

Method 3: Configure imagePullSecrets for Private Registries

Without a valid secret, the kubelet cannot authenticate with Docker Hub, GitLab Registry, or AWS ECR. The secret must be created in the same namespace where the application is deployed.

kubectl create secret docker-registry regcred \
  --docker-server=https://registry.example.com \
  --docker-username=deploy-bot \
  --docker-password=$CI_TOKEN \
  --namespace=production

Attach the secret to the deployment:

spec:
  template:
    spec:
      containers:
        - name: app
          image: registry.example.com/app:v1.2.3
      imagePullSecrets:
        - name: regcred

Verify the status: kubectl get secret regcred -o yaml and ensure the token has not expired.

Method 4: Clear Corrupted Runtime Cache

Sometimes, a partial download leaves behind "corrupted" layers that block subsequent attempts. The runtime fails to recognize them and continuously returns an error.

# Scale down the problematic pod so kubelet stops managing it
kubectl scale deployment <name> --replicas=0

# Clear the local cache on the node
sudo crictl rmi registry.example.com/app:v1.2.3

# Restart the runtime
sudo systemctl restart containerd

Once the runtime is back up, scale the deployment back to the desired number of replicas. The pull will start from scratch.

Prevention

Time synchronization: Configure chrony or systemd-timesyncd on all nodes. A system clock skew of more than 5 minutes breaks TLS handshakes with registries.
Disk space monitoring: Deploy node-exporter and set up alerts for NodeDiskPressure. A full disk is the most silent cause of pull failures.
Token rotation: For CI/CD, use short-lived credentials and automate imagePullSecrets updates via External Secrets Operator or HashiCorp Vault.
Local mirrors: For large clusters, configure a registry-mirror in the containerd configuration. This reduces load on public registries and eliminates network timeouts.

F.A.Q.

What is the difference between ImagePullBackOff and ErrImagePull?

How do I manually test registry access from a node?

Do I need to restart the cluster after fixing it?

Hints

Diagnose Pod Status

Verify Registry Access

Configure imagePullSecrets

Clear Cache and Restart