CrashLoopBackOff with kubectl: Causes and Fixes

What the CrashLoopBackOff Error Means

CrashLoopBackOff is not an error code in the classical sense, but a pod state in Kubernetes. It means that a container within the pod is constantly terminating (crashing), and the kubelet (the agent on the node) triggers an exponential back-off mechanism between restart attempts.

You will see this state when running the command:

kubectl get pods

The STATUS column will show CrashLoopBackOff, and the RESTARTS column will show a continuously increasing number.

The full event text (from kubectl describe pod) typically looks like this:

  State:          Waiting
    Reason:       CrashLoopBackOff
  Last State:    Terminated
    Reason:       Error
    Exit Code:    1
    Started:      ...
    Finished:     ...

Common Causes

The CrashLoopBackOff state is almost always caused by the process inside the container exiting with a non-zero return code (usually 1 or 137). The main reasons are:

Application error: An unexpected exception in the code, a configuration error, or a missing required file or environment variable. The container starts, and the application crashes.
Incorrect start command (CMD/ENTRYPOINT): The Dockerfile or Pod specification includes a command that exits immediately (e.g., running a script without exec or starting a background process without wait).
Insufficient resources (OOMKilled): The container runs out of memory, and the node's kernel kills it. In the logs and event, you will see Reason: OOMKilled and Exit Code: 137.
Permission issues: The application tries to write to a protected directory (e.g., /root) or lacks read permissions for a file.
Misconfigured health probes (livenessProbe): A probe (e.g., an HTTP GET) constantly returns a non-successful response, and the kubelet kills the container even if the application is otherwise functional.
Missing expected background process: The application starts, spawns a child process, and then exits, leaving the child process orphaned, which is subsequently terminated.

Troubleshooting Steps

Step 1: Analyze Pod Logs and Events

This is the first and most critical step. The commands below will provide precise indications of the root cause.

View detailed information about the pod:
```
kubectl describe pod <your-pod-name>
```
Scroll down to the Events section. Look for recent events of type Failed or BackOff. Pay attention to the Reason and Message fields. Also, in the Containers -> <container-name> -> State section, you will see Last State (the previous state) and Reason (e.g., Error, OOMKilled).
Get the container logs:
- Logs from the last crashed instance (the most important):
```
kubectl logs <your-pod-name> --previous
```
- Logs from the current (possibly just restarted) container:
```
kubectl logs <your-pod-name>
```
- If the pod has multiple containers, specify the container name: kubectl logs <pod-name> -c <container-name> --previous.
What to look for in the logs: Exceptions (Java, Python, Node.js), messages like permission denied, connection refused, file not found, OutOfMemoryError, Killed. The last 5-10 lines often contain the core issue.

Step 2: Check and Adjust Resources (Memory/CPU)

If kubectl describe pod or the logs mention OOMKilled or Exit Code: 137, the problem is likely a memory shortage.

Temporarily increase the memory limit in your Pod's YAML manifest:

spec:
  containers:
  - name: my-app
    image: my-image
    resources:
      limits:
        memory: "512Mi"   # Increase, for example, to 1Gi
      requests:
        memory: "256Mi"

Apply the changes:
```
kubectl apply -f your-manifest.yaml
```
If the container runs stably—the issue was the limits. Optimize your application or set more realistic limits.

Step 3: Verify and Fix the Start Command (CMD/ENTRYPOINT)

The container must run a foreground process. If your entrypoint script (entrypoint.sh) starts a daemon in the background (&) and then exits, the container will die.

Incorrect (in entrypoint.sh):

#!/bin/sh
my-background-service &  # Runs in background and exits the script

Correct:

#!/bin/sh
exec my-background-service  # exec replaces the shell process and does not return

Or, if you need to start multiple processes, use a process manager (e.g., tini or supervisord).

Step 4: Temporarily Disable or Check Health Probes

A misconfigured livenessProbe is a common cause, especially in early development stages.

Temporarily disable the probe in your pod manifest by commenting out its section:

# livenessProbe:
#   httpGet:
#     path: /health
#     port: 8080
#   initialDelaySeconds: 30
#   periodSeconds: 10

Apply the manifest and check if the CrashLoopBackOff disappears.
If the problem is resolved—configure the probe correctly:
- Increase initialDelaySeconds to give the application more time to start.
- Ensure the path and port are correct and that the service actually responds with 2xx/3xx on that endpoint.
- For applications with long initialization times, use readinessProbe to control traffic readiness, and use livenessProbe only for liveness checks.

Step 5: Check Environment Variables and Configuration Files

Ensure all required environment variables (env) and volumes (volumes/configMap/secret) are correctly specified and accessible inside the container.

Verify that variables are passed:

kubectl exec <pod-name> -- printenv | grep -i <important-variable-name>

If using a ConfigMap or Secret, check its existence and data:
```
kubectl get configmap <configmap-name> -o yaml
```
Check permissions on mounted volumes if the container needs to write to them.

Prevention

Always check logs (kubectl logs --previous) when first deploying a new pod.
Start with adequate resource limits. Use kubectl top pod to monitor actual consumption.
Configure health probes correctly. initialDelaySeconds should exceed the application's cold start time.
Use restartPolicy: OnFailure for Jobs, not Always, to avoid infinite loops on task failures.
Build images with a non-root USER where possible, and set directory permissions (chown) at build time, not in the entrypoint.
Test the image locally (docker run --rm <image> <command>) before deploying to Kubernetes to isolate image issues from orchestration problems.

F.A.Q.

How does CrashLoopBackOff differ from ImagePullBackOff?

How to view logs of a crashed container if it has already restarted?

What is `restartPolicy` and how does it affect CrashLoopBackOff?

Can insufficient memory on the node (not the pod) cause CrashLoopBackOff?

Hints

Check pod status and current events

Get container logs (previous and current)

Verify the startup command (CMD/ENTRYPOINT)

Check resource limit settings (requests/limits)

Configure health probes (liveness/readiness) or temporarily disable them