# Resiliency alerts

## What are resiliency alerts

Alerts are designed to quickly identify and notify of relevant indicator changes, ensuring they can be eliminated before impacting the system.&#x20;

{% hint style="info" %}
For faster updates, utilize [Slack](/customize-workflow/communication-and-messaging/slack-integration.md), [MS Teams](/customize-workflow/communication-and-messaging/ms-teams-integration.md), or [DataDog Alerts](/customize-workflow/communication-and-messaging/datadog-alerts-integration.md) Integrations to receive notifications when an Alert is generated.
{% endhint %}

{% hint style="success" %}
By default, PerfectScale generates alerts for every cluster where a [resilience indicator](/visibility-and-optimization/podfit-or-vertical-pod-right-sizing/understanding-at-risk-indicators.md) with a [min\_risk\_level: high](/customize-workflow/alerting/resiliency-alerts.md#customize-the-alerts-by) was identified.&#x20;

In order to customize your alerts, an [Alerts Profile](/customize-workflow/alerting/resiliency-alerts.md) must be set up and applied to the cluster. It will override the default alert generation process, ensuring identification and notification about relevant indicator changes.
{% endhint %}

<figure><img src="/files/B5vqB3psJHRDuL7a4BH6" alt=""><figcaption><p>Resiliency alerts profile</p></figcaption></figure>

{% hint style="info" %}
You can apply only one Resiliency Alert Profile per cluster.
{% endhint %}

{% hint style="info" %}
A cluster that transmits data is required to start receiving alerts configured with an Alert Profile.
{% endhint %}

## Configuring resiliency alerts with Alerts Profile

### :tools: **How to create** **Alerts Profile**

There are two options for creating a Profile: [from the Settings tab](#from-the-settings-tab) or directly [from the Overview](#from-the-overview-tab).

#### **From the Settings tab.**

Go to the **`Settings`** tab on the left panel -> select **`Alerts`** -> click the **`+Add Profile`** button -> name the profile in the corresponding row -> specify the `min_risk_level` (low, medium, high) and ignored entities (if needed) -> click **`Save`** button.

<figure><img src="/files/gHmQ2UQ3Ws8eMfwqcAIh" alt=""><figcaption><p>Resiliency alerts profile from settings</p></figcaption></figure>

{% hint style="info" %}
If the `min_risk_level` is set to medium, only indicators with `high` or `medium` severity will trigger alerts.
{% endhint %}

#### From the **Overview** tab

Go to the **`Overview`** tab on the left panel -> find the cluster to which you want to apply the **`Alert Profile`** and click **`gear`** button -> go to **`Customizations`** -> click on **`Add New Profile`** in the **`Alerts`** drop-down list -> name the profile in the corresponding row -> specify the `min_risk_level` (low, medium, high) and ignored entities (if needed) -> click the **`Save And Apply`** button.

<figure><img src="/files/TOhPGgrW9l18U9hCDhoC" alt=""><figcaption><p>Resiliency alerts profile from overview</p></figcaption></figure>

#### Alerts Profile Configuration

<figure><img src="/files/w5fkqx0HCqr61LfAQAzy" alt=""><figcaption><p>Resiliency alerts profile configuration</p></figcaption></figure>

1. Name the profile.
2. Configure the profile:
   * `min_risk_level` - the minimum risk level of the issue to trigger the alert.\
     \
     :bulb:**NOTE**: if the `min_risk_level` is set to medium, only indicators with `high` or `medium` severity will trigger alerts.<br>
   * `ignore_namespace` - excludes specific namespaces from triggering resiliency alerts..
   * `ignore_workload` - excludes specific workloads from triggering resiliency alerts..
   * `ignore_container` - excludes specific containers from triggering resiliency alerts.
   * `ignore_indicator` - excludes specific [resiliency issues indicators](/visibility-and-optimization/podfit-or-vertical-pod-right-sizing/understanding-at-risk-indicators.md) from triggering alerts.
   * `active_notification_resend` - enables the configuration of the active alert resend feature at a specified interval to ensure critical alerts remain visible until addressed.\
     \
     :bulb: **NOTE**: `active_notification_resend`  is set to `off` by default.\
     \
     :point\_right: **Examples**: \
     Set `active_notification_resend: 5h`  to resend notifications for active alerts every 5 hours.\
     Set `active_notification_resend: 1d` to resend notifications for active alerts every day.&#x20;

### :tools: **How to apply** **Alerts Profile**

#### Apply to a single cluster

To apply **`Alert Profile`** to the cluster, go to the **`Overview`** tab on the left panel -> find the cluster to which you want to apply the **`Alert Profile`** and click **`gear`** button -> go to **`Customizations`** -> select the needed profile in the **`Alerts`** drop-down list.&#x20;

<figure><img src="/files/oLVrW6sBsOv905k1lmwc" alt=""><figcaption><p>Applying resiliency alerts profile to a single cluster</p></figcaption></figure>

#### Apply to multiple clusters

To apply the profile to **multiple clusters** from a single view, use the **`Manage Assignments`** feature.&#x20;

Go to the **`Settings`** tab on the left panel -> select the **`Alerts`** -> click the **`Manage Assignments`** button -> apply the profiles for the needed clusters -> click the **`Save Changes`** button.

:bulb:***NOTE***: You can easily manage your profiles (create, delete), but deleting such profiles is impossible if the profile is already connected to the cluster. Change the profile to default or any other, and only after that, remove the current one. If you change the Alert Profile to 'None', it will use the default action for alert generation (detected [resilience indicator](/visibility-and-optimization/podfit-or-vertical-pod-right-sizing/understanding-at-risk-indicators.md) with a [min\_risk\_level: high](/customize-workflow/alerting/resiliency-alerts.md#customize-the-alerts-by)).

{% hint style="info" %}
If you want to stop generating alerts to your communication channels, disconnect the messaging profile from the cluster ([Slack](/customize-workflow/communication-and-messaging/slack-integration.md), [MS Teams](/customize-workflow/communication-and-messaging/ms-teams-integration.md)).
{% endhint %}

## Configuring resiliency alerts integration with CRD

To enable alerting for resiliency risks using a Custom Resource Definition (CRD), you’ll need to define and apply a Custom Resource (CR) that specifies your alert parameters. This approach allows you to manage alert configurations directly through Kubernetes manifests.&#x20;

{% hint style="warning" %}
The Custom Resource (CR) must be created in the `perfectscale` namespace.
{% endhint %}

Here is an example of the CR configuration:

```yaml
apiVersion: perfectscale.io/v1
kind: ClusterSettings
metadata:
  name: cluster-settings-main
  namespace: perfectscale
spec:
  profiles:
    resiliency_alerts:
      - name: production-alerts
        assigned: true
        value:
          min_risk_level: high
          ignore_workload: "^(test-.*|dev-.*)"
          ignore_namespace: "^(kube-system|kube-public|kube-node-lease)$"
          ignore_container: "^(istio-proxy|envoy|linkerd-proxy)$"
          ignore_indicator: "CpuThrottling, CpuRequestNotSet"
          active_notification_resend: 'off'
```

#### ⚙️ **CR parameters:**

<table><thead><tr><th width="251.94921875">Parameter</th><th>Description</th></tr></thead><tbody><tr><td><strong><code>min_risk_level</code></strong></td><td><p>Specifies the minimum risk level that triggers resiliency alerts.</p><p><strong>Values</strong>: high, medium, low.</p><p><strong>Example</strong>: if set to <code>medium</code>, alerts will be generated for both <code>medium</code> and <code>high</code> risk levels.</p></td></tr><tr><td><p><strong><code>ignore_workload</code></strong><br><br><br><br></p><p><br><strong><code>ignore_namespace</code></strong><br><br><strong><code>ignore_container</code></strong></p></td><td>Specifies the namespaces, workloads, or containers for which alerts are disabled. Use the regex pattern.<br><br><span data-gb-custom-inline data-tag="emoji" data-code="1f4a1">💡</span>To ignore multiple entities in one row, list them separated by commas.</td></tr><tr><td><strong><code>ignore_indicator</code></strong></td><td>Specifies resiliency issues for which alerts will not be generated.<br><strong>Examples</strong>: OOM, CpuThrottling, CpuRequestNotSet, MemRequestNotSet, MemLimitNotSet, UnderProvisionedMemRequest, UnderProvisionedCpuRequest, UnderProvisionedMemLimit, UnderProvisionedCpuLimit, OverProvisionedCpuRequest, OverProvisionedMemRequest, RestartsObserved, EvictionsObserved, HPAAtMaxReplicasObserved<br><br><span data-gb-custom-inline data-tag="emoji" data-code="1f4a1">💡</span> Leave empty to not ignore any</td></tr><tr><td><strong><code>active_notification_resend</code></strong></td><td>Specifies the interval for re-sending notifications for active alerts.<br><strong>Values</strong>: <br><code>off</code> (default) - disables re-sending notifications;<br><code>h</code> - sets hourly interval<br><code>d</code> - sets daily interval<br><strong>Example</strong>: <code>active_notification_resend: '2h'</code> - re-sends notification for active alerts every 2 hours.</td></tr></tbody></table>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.perfectscale.io/customize-workflow/alerting/resiliency-alerts.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
