What the CrashLoopBackOff Error Means
CrashLoopBackOff is not an error code in the classical sense, but a pod state in Kubernetes. It means that a container within the pod is constantly terminating (crashing), and the kubelet (the agent on the node) triggers an exponential back-off mechanism between restart attempts.
You will see this state when running the command:
kubectl get pods
The STATUS column will show CrashLoopBackOff, and the RESTARTS column will show a continuously increasing number.
The full event text (from kubectl describe pod) typically looks like this:
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: ...
Finished: ...
Common Causes
The CrashLoopBackOff state is almost always caused by the process inside the container exiting with a non-zero return code (usually 1 or 137). The main reasons are:
- Application error: An unexpected exception in the code, a configuration error, or a missing required file or environment variable. The container starts, and the application crashes.
- Incorrect start command (CMD/ENTRYPOINT): The Dockerfile or Pod specification includes a command that exits immediately (e.g., running a script without
execor starting a background process withoutwait). - Insufficient resources (OOMKilled): The container runs out of memory, and the node's kernel kills it. In the logs and event, you will see
Reason: OOMKilledandExit Code: 137. - Permission issues: The application tries to write to a protected directory (e.g.,
/root) or lacks read permissions for a file. - Misconfigured health probes (livenessProbe): A probe (e.g., an HTTP GET) constantly returns a non-successful response, and the kubelet kills the container even if the application is otherwise functional.
- Missing expected background process: The application starts, spawns a child process, and then exits, leaving the child process orphaned, which is subsequently terminated.
Troubleshooting Steps
Step 1: Analyze Pod Logs and Events
This is the first and most critical step. The commands below will provide precise indications of the root cause.
- View detailed information about the pod:
kubectl describe pod <your-pod-name>
Scroll down to theEventssection. Look for recent events of typeFailedorBackOff. Pay attention to theReasonandMessagefields. Also, in theContainers-><container-name>->Statesection, you will seeLast State(the previous state) andReason(e.g.,Error,OOMKilled). - Get the container logs:
- Logs from the last crashed instance (the most important):
kubectl logs <your-pod-name> --previous - Logs from the current (possibly just restarted) container:
kubectl logs <your-pod-name> - If the pod has multiple containers, specify the container name:
kubectl logs <pod-name> -c <container-name> --previous.
What to look for in the logs: Exceptions (Java, Python, Node.js), messages likepermission denied,connection refused,file not found,OutOfMemoryError,Killed. The last 5-10 lines often contain the core issue. - Logs from the last crashed instance (the most important):
Step 2: Check and Adjust Resources (Memory/CPU)
If kubectl describe pod or the logs mention OOMKilled or Exit Code: 137, the problem is likely a memory shortage.
- Temporarily increase the memory limit in your Pod's YAML manifest:
spec: containers: - name: my-app image: my-image resources: limits: memory: "512Mi" # Increase, for example, to 1Gi requests: memory: "256Mi" - Apply the changes:
kubectl apply -f your-manifest.yaml - If the container runs stably—the issue was the limits. Optimize your application or set more realistic limits.
Step 3: Verify and Fix the Start Command (CMD/ENTRYPOINT)
The container must run a foreground process. If your entrypoint script (entrypoint.sh) starts a daemon in the background (&) and then exits, the container will die.
Incorrect (in entrypoint.sh):
#!/bin/sh
my-background-service & # Runs in background and exits the script
Correct:
#!/bin/sh
exec my-background-service # exec replaces the shell process and does not return
Or, if you need to start multiple processes, use a process manager (e.g., tini or supervisord).
Step 4: Temporarily Disable or Check Health Probes
A misconfigured livenessProbe is a common cause, especially in early development stages.
- Temporarily disable the probe in your pod manifest by commenting out its section:
# livenessProbe: # httpGet: # path: /health # port: 8080 # initialDelaySeconds: 30 # periodSeconds: 10 - Apply the manifest and check if the
CrashLoopBackOffdisappears. - If the problem is resolved—configure the probe correctly:
- Increase
initialDelaySecondsto give the application more time to start. - Ensure the
pathandportare correct and that the service actually responds with2xx/3xxon that endpoint. - For applications with long initialization times, use
readinessProbeto control traffic readiness, and uselivenessProbeonly for liveness checks.
- Increase
Step 5: Check Environment Variables and Configuration Files
Ensure all required environment variables (env) and volumes (volumes/configMap/secret) are correctly specified and accessible inside the container.
- Verify that variables are passed:
kubectl exec <pod-name> -- printenv | grep -i <important-variable-name> - If using a ConfigMap or Secret, check its existence and data:
kubectl get configmap <configmap-name> -o yaml - Check permissions on mounted volumes if the container needs to write to them.
Prevention
- Always check logs (
kubectl logs --previous) when first deploying a new pod. - Start with adequate resource limits. Use
kubectl top podto monitor actual consumption. - Configure health probes correctly.
initialDelaySecondsshould exceed the application's cold start time. - Use
restartPolicy: OnFailurefor Jobs, notAlways, to avoid infinite loops on task failures. - Build images with a non-root
USERwhere possible, and set directory permissions (chown) at build time, not in the entrypoint. - Test the image locally (
docker run --rm <image> <command>) before deploying to Kubernetes to isolate image issues from orchestration problems.