Common issues and troubleshooting

Troubleshooting guide for PerfectScale self-hosted environment

Explore step-by-step guidance for troubleshooting common pod failures and infrastructure-related issues in a self-hosted PerfectScale deployment.

Pod failure troubleshooting

After installing all Helm charts, follow these systematic steps to diagnose and resolve pod issues

Check the pod status

Check the status of all pods in your namespace.

kubectl get pods -n <namespace_name>

Look for pods with status indicators such as Error, CrashLoopBackOff, ImagePullBackOff, or Pending.

Examine the pod logs

For pods that show errors, examine the logs for detailed error messages.

kubectl logs -n <namespace_name> <pod_name>

For multi-container pods, specify the container name.

kubectl logs -n <namespace_name> <pod_name> -c <container_name>

Analyze the pod events and details

Get comprehensive information about problematic pods.

kubectl describe pod <pod_name> -n <namespace_name>

Focus on:

State: The current container state has potential error messages.
Last state: The previous container states whether restarts have occurred.
Ready: Indicates whether the pod passed readiness probes.

Common pod failure scenarios and solutions

ImagePullBackOff

👉🏻 Reason: Container registry access issues or the image does not exist.

💡 How to solve:

Verify container registry credentials.
Verify the image name and tag for accuracy.
Ensure network connectivity with the registry.

CrashLoopBackOff

👉🏻 Reason: The application crashes immediately after starting.

💡 How to solve:

Check application logs.
Verify the environment variables and configuration.
Ensure sufficient resource allocation.
Detailed CrashLoopBackOff troubleshooting guide.

Resource Constraints

👉🏻 Reason: Insufficient CPU and/or memory.

💡 How to solve:

Check the availability of node resources.

kubectl describe node <node_name>

Verify the pod's resource requests and limits.

kubectl get pod <pod_name> -n <namespace_name> -o yaml | grep -A 5 resources

Network and DNS troubleshooting

General network diagnostics

Verify service connectivity.

kubectl get svc -n <namespace_name>

Test network policies.

kubectl get networkpolicies -n <namespace_name>

Check ingress resources.

kubectl get ingress -n <namespace_name>

AWS-specific network configuration

Ensure that a valid domain name is configured in the Route53 hosted zone.
Verify the necessary DNS records: NS, CNAME, and A records.
Configure the AWS Certificate Manager (ACM) for the issuance of domain certificates.

External AWS configuration resources:

Helm chart troubleshooting

PerfectScale utilizes Helm charts for service deployment. Follow these steps to troubleshoot Helm installations.

Check Helm Release Status

List all Helm releases in your namespace.

helm list -n <namespace_name>

Review the STATUS column for each release. A status of deployed indicates successful deployment.

Manage Problematic Releases

For releases in a failed or pending state:

View release history

helm history <release-name> -n <namespace_name>

Roll back to a previous stable version.

helm rollback <release-name> <revision-number> -n <namespace_name>

View detailed release information

helm get all <release-name> -n <namespace_name>

Support resources

If the issue persists after following these steps, feel free to contact PerfectScale support through your preferred channel, either Slack or email, for further assistance.

To help us resolve the issue faster, please include the following information when reaching out:

Namespace name
Relevant pod logs and events
Cluster information
Steps attempted.

Previousps-exporter via Proxy Configuration NextHelp PerfectScale to improve

Last updated 6 months ago

Was this helpful?