How to onboard a cluster

Step-by-step guide on how to onboard your first cluster and start optimizing within a few minutes

Onboarding instructions

  1. Login into the PerfectScale app.

  2. Once logged in, navigate to the Overview tab.

  3. You can see a button to connect a new Kubernetes cluster. To connect a new cluster, click the Add Cluster button. A pop-up window with the following steps will appear.

    Onboard first cluster
    Onboard first cluster

To provision clusters dynamically, follow the instructions for Onboarding clusters programmatically.

Cluster configuration

When onboarding a cluster, choose one deployment method: deploy with the PerfectScale Operator or deploy with Helm.

We recommend you to install the PerfectScale Operator to automate deployment and ensure your agents stay up to date with the latest features and product updates.

Connect cluster

PerfectScale does not support Windows nodes. If your cluster contains both Windows and Linux nodes, adding the nodeSelector to run the PerfectScale agent on the Linux nodes is necessary. Follow the extra instructions provided to run PerfectScale on these nodes.

  1. In a pop-up window, click Copy to Clipboard and clone the Helm chart provided on p.1.

  2. Enter a name for your cluster and select the desired Optimization Policy:

    • MaxSavings - maximum cost savings, the best for non-production environments

    • Balanced (default) - optimally balances cost and resiliency

    • ExtraHeadroom - the best fit for latency-sensitive environments

    • MaxHeadroom - keeps the environment above the highest spikes

The Optimization Policy feature allows you to specify how your resources should be allocated in order to support the individual needs of your workloads. Define the policies that best suit your environment and business goals, depending on whether you want to maximize cost savings or provide extra headroom to maintain the resilience of mission-critical services. The Optimization Policy can be set at a specific level. The workload's Optimization Policy takes precedence and will override the value defined at the cluster level.

Learn more about the optimization policy customization here.

Discover more about customizing the Optimization policy here.

Connecting cluster with PerfectScale Operator

Once you have named your cluster and selected the Optimization Policy, click Get Install Command, so PerfectScale will provide you with the needed installation components.

Connect cluster with PerfectScale Operator

Add Helm repo:

Deploy PerfectScale Operator:

Apply CRD:

Learn more about PerfectScale Operator CRD here.

Alternatively, you can install the Operator with a single command:

Connecting cluster with Helm

Once the optimization policy is selected, preceede wit hthe following steps:

Click the Generate Secret button.

Execute the command from Deploy PerfectScale.

If you are utilizing both Windows and Linux nodes within the cluster, it is necessary to run PerfectScale agent on the Linux nodes. You can do it following the instructions -> Mix of Windows and Linux nodes.

PerfectScale supports Windows containers. To enable support for Windows containers, follow the instructions outlined here.

  1. Click on the Finish and Close button.

It is not supported to install multiple agents in a single cluster, as it may result in unexpected results.

The newly added cluster will appear under the Clusters list in the Overview tab. Once the Agent Status becomes green, the cluster data will appear, which indicates successful cluster creation.

Agent status
Agent status

💡 Discover additional information regarding the Overview.

Your cluster will become visible only after it starts transmitting data.

Run PerfectScale agent on specific nodes

PerfectScale allows running the agent on particular nodes. To run the agent on the specific nodes, use one of the following options:

  1. nodeSelector (in case there is no taint on the node).

  2. nodeSelector & toleration (in case of taint on the node).

Mix of Windows and Linux nodes

For Step 4 in the Cluster configuration instructions above, utilize the subsequent command to designate the Windows nodes:

GPU support

GPU support is available starting with the exporter version 1.0.55.

GPU memory support is available starting with the exporter version 1.1.11. Learn more about requirements for GPU memory support here.

PerfectScale’s advanced GPU support feature helps teams optimize resource-intensive applications like AI, machine learning, cloud computing, etc., ensuring better performance, reducing costs, and improving overall efficiency across the entire Kubernetes stack.

PerfectScale now only supports NVIDIA Data Center GPU Manager (DCGM).

In order to enable GPU support, in Step 4 in the Cluster configuration instructions above, utilize the following command:

Required DCGM metrics for GPU memory support

PerfectScale requires the DCGM exporter to expose two GPU framebuffer metrics:

  • DCGM_FI_DEV_FB_TOTAL - total framebuffer capacity

  • DCGM_FI_DEV_FB_RESERVED - driver-reserved framebuffer

PerfectScale uses these metrics to calculate GPU memory utilization. If they are not exposed, GPU memory columns in InfraFit, PodFit, and node-group views will appear empty.

To confirm which metrics are currently exposed in your cluster, run:

If the command returns no output, add the missing metrics to the DCGM exporter configuration. Use one of the following approaches to update the DCGM exporter configuration:

Patch the DCGM ConfigMap directly

Use this option if the GPU Operator is installed and managed manually, and the DCGM exporter ConfigMap is not controlled by GitOps or another reconciliation process.

1

Find the ConfigMap that contains the DCGM exporter metrics list

2

Edit the relevant ConfigMap

3

Locate the dcgm-metrics.csv key

This key contains the list of DCGM fields exposed by the exporter. Add the following metrics to that list and save the ConfigMap.

If DCGM_FI_DEV_FB_RESERVED is already present, do not duplicate it. Add only DCGM_FI_DEV_FB_TOTAL.

4

Restart the DCGM exporter

Configure metrics through Helm values

Use this option if the GPU Operator is managed by ArgoCD, Flux, or another GitOps workflow.

In GitOps-managed clusters, manual edits to the ConfigMap are usually overwritten during the next reconciliation. Instead, define a custom metrics ConfigMap and reference it from the GPU Operator Helm values.

1

Update the GPU Operator Helm values

This will point the DCGM exporter to your custom ConfigMap.

2

Create a ConfigMap with the same name in the gpu-operator namespace

The ConfigMap must include a dcgm-metrics.csv key that contains NVIDIA’s default counters list plus the required framebuffer metrics

3

Check if the dcgm-metrics.csv file includes the following entries

If DCGM_FI_DEV_FB_RESERVED is already present, do not duplicate it. Add only DCGM_FI_DEV_FB_TOTAL.

4

Restart the DCGM exporter

Propagation

After the DCGM exporter restarts, run the verify command again to make sure both metrics are exposed.

Once they are available, it may take up to 15 minutes for GPU memory data to appear in the PerfectScale UI.

Java containers support

The Coroot agent is disabled by default. Once enabled, PerfectScale automatically detects Java containers and starts collecting JVM metrics.

Add the following parameter when deploying the PerfectScale agent to enable this feature:

Coroot supports only Linux nodes.

PerfectScale automatically identifies Java containers running in your Kubernetes environment and collects JVM metrics from them. By continuously analyzing them, PerfectScale provides granular visibility into resource usage, identifying potential bottlenecks. Based on this analysis, PerfectScale generates tailored recommendations to help ensure that your services remain efficient and maintain consistent performance.

PerfectScale starts collecting JVM metrics automatically. However, if for some reason you do not want PerfectScale to collect this data, you can disable it by turning off the deployment of psc-coroot-node-agent during agent installation:

Deploy coroot pods to specific nodes

If your cluster includes both Linux and Windows nodes, make sure to set a nodeSelector for Linux when deploying workloads that are not Windows-compatible.

YAML values file example:

Helm command example:

To restrict workloads to a specific set of nodes (for example, those labeled component=java), you can combine multiple node selectors.

YAML values file example:

Helm command example:

Windows containers support

Make sure you are using PerfectScale exporter version 1.0.53 or later to enable Windows containers support.

PerfectScale supports Windows-based containers, allowing you to optimize and manage them seamlessly. To enable this feature, as a Step 4 in a cluster configuration, execute the following command:

  • Make sure the Helm parameter windowsExporterEnabled is set to true

  • If the windows-exporter should be deployed by the PerfectScale Helm set deployWindowsExporter=true. In that case, no additional configurations are needed, as the default values will be sufficient.

  • If deployWindowsExporter set tofalse, the additional parameters listed below need to be configured. Helm parameters should be configured according to the user’s environment:

    • windowsExporterNamespace

    • windowsExporterPort

    • windowsExporterLabelSelector

Size PerfectScale Agent

For large clusters, if you are not using automation, use PerfectScale’s recommendations for psc-exporter to properly size it.

Setting cAdvisor scraping mode

If you prefer not to expose node/proxy permissions, execute the following command when installing the exporter:

Depending on your security and access requirements, specify one of the scraping modes:

  • auto (default): Direct scraping with automatic failover to proxy if all nodes fail.

  • direct: Force direct node scraping; excludes nodes/proxy RBAC permissions.

  • nodeProxy: Force proxy scraping via the Kubernetes API.

Uninstalling PerfectScale exporter

To uninstall the PerfectScale Agent, execute the following command

How to whitelist PerfectScale on GKE with binary authorization enabled

If your GKE cluster enforces Binary Authorization, you need to add PerfectScale images to the Binary Authorization policy whitelist before installation.

Add the following entries under admissionWhitelistPatterns:

Example:

These entries allow:

  • PerfectScale images from public.ecr.aws/perfectscale-io

  • kube-state-metrics images from registry.k8s.io

Without this, PerfectScale components may be blocked from running in clusters where Binary Authorization is enforced.

Last updated

Was this helpful?