# Common issues and troubleshooting

Explore step-by-step guidance for troubleshooting common pod failures and infrastructure-related issues in a self-hosted PerfectScale deployment.

## Pod failure troubleshooting <a href="#pod-failure-troubleshooting" id="pod-failure-troubleshooting"></a>

After installing all Helm charts, follow these systematic steps to diagnose and resolve pod issues

### Check the pod status <a href="#id-1.-verify-pod-status" id="id-1.-verify-pod-status"></a>

Check the status of all pods in your namespace.

```
kubectl get pods -n <namespace_name>
```

Look for pods with status indicators such as *Error*, *CrashLoopBackOff*, *ImagePullBackOff*, or *Pending*.

### Examine the pod logs <a href="#id-2.-examine-pod-logs" id="id-2.-examine-pod-logs"></a>

For pods that show errors, examine the logs for detailed error messages.

```
kubectl logs -n <namespace_name> <pod_name>
```

For multi-container pods, specify the container name.

```
kubectl logs -n <namespace_name> <pod_name> -c <container_name>
```

### Analyze the pod events and details <a href="#id-3.-analyse-pod-events-and-details" id="id-3.-analyse-pod-events-and-details"></a>

Get comprehensive information about problematic pods.

```
kubectl describe pod <pod_name> -n <namespace_name>
```

Focus on:

* **State**: The current container state has potential error messages.
* **Last state**: The previous container states whether restarts have occurred.
* **Ready**: Indicates whether the pod passed readiness probes.

### Common pod failure scenarios and solutions <a href="#id-4.-common-pod-failure-scenarios-and-solutions" id="id-4.-common-pod-failure-scenarios-and-solutions"></a>

#### **ImagePullBackOff**

👉🏻 **Reason**: Container registry access issues or the image does not exist.

💡 **How to solve**:

* Verify container registry credentials.
* Verify the image name and tag for accuracy.
* Ensure network connectivity with the registry.

#### **CrashLoopBackOff**

👉🏻 **Reason**: The application crashes immediately after starting.

💡 **How to solve**:

* Check application logs.
* Verify the environment variables and configuration.
* Ensure sufficient resource allocation.
* [Detailed CrashLoopBackOff troubleshooting guide](https://www.perfectscale.io/blog/crashloopbackoff).

#### **Resource Constraints**

👉🏻 **Reason**: Insufficient CPU and/or memory.

💡 **How to solve**:

* Check the availability of node resources.

```
kubectl describe node <node_name>
```

* Verify the pod's resource requests and limits.

```
kubectl get pod <pod_name> -n <namespace_name> -o yaml | grep -A 5 resources
```

## Network and DNS troubleshooting <a href="#network-and-dns-troubleshooting" id="network-and-dns-troubleshooting"></a>

### General network diagnostics <a href="#general-network-diagnostics" id="general-network-diagnostics"></a>

1. Verify service connectivity.

```
kubectl get svc -n <namespace_name>
```

2. Test network policies.

```
kubectl get networkpolicies -n <namespace_name>
```

3. Check ingress resources.

```
kubectl get ingress -n <namespace_name>
```

### AWS-specific network configuration <a href="#aws-specific-network-configuration" id="aws-specific-network-configuration"></a>

1. Ensure that a valid domain name is configured in the Route53 hosted zone.
2. Verify the necessary DNS records: NS, CNAME, and A records.
3. Configure the AWS Certificate Manager (ACM) for the issuance of domain certificates.

### **External AWS configuration resources:**

* [AWS Certificate Manager documentation](https://docs.aws.amazon.com/acm/latest/userguide/acm-overview.html)
* [Route53 documentation](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/Welcome.html)

## Helm chart troubleshooting <a href="#helm-chart-troubleshooting" id="helm-chart-troubleshooting"></a>

PerfectScale utilizes Helm charts for service deployment. Follow these steps to troubleshoot Helm installations.

#### Check Helm Release Status <a href="#id-1.-check-helm-release-status" id="id-1.-check-helm-release-status"></a>

List all Helm releases in your namespace.

```
helm list -n <namespace_name>
```

Review the `STATUS` column for each release. A status of `deployed` indicates successful deployment.

#### Manage Problematic Releases <a href="#id-2.-manage-problematic-releases" id="id-2.-manage-problematic-releases"></a>

For releases in a `failed` or `pending` state:

1. View release history

```
helm history <release-name> -n <namespace_name>
```

2. Roll back to a previous stable version.

```
helm rollback <release-name> <revision-number> -n <namespace_name>
```

3. View detailed release information

```
helm get all <release-name> -n <namespace_name>
```

## Support resources <a href="#support-resources" id="support-resources"></a>

If the issue persists after following these steps, feel free to contact PerfectScale support through your preferred channel, either [Slack](https://join.slack.com/t/perfectscalecommunity/shared_invite/zt-1tu9teu9e-Z9tGt4LpNI8tUC3j8obcmQ) or [email](mailto:support@perfectscale.io), for further assistance.

&#x20;To help us resolve the issue faster, please include the following information when reaching out:

* Namespace name
* Relevant pod logs and events
* Cluster information
* Steps attempted.
