Deploy AOM Prevent ELB Alarm Storm
Application Scenario
Application Operations Management (AOM) is a one-stop application operations management platform provided by Huawei Cloud, supporting core functions such as application monitoring, log management, and alarm management. When monitoring ELB business layer metrics, a large number of duplicate or similar alarms may be generated, causing alarm storms that affect operational efficiency. By configuring AOM alarm group rules, similar alarms can be grouped and merged, reducing alarm noise and preventing alarm storms, improving the effectiveness of alarm management.
This best practice will introduce how to use Terraform to automatically deploy AOM prevent ELB alarm storm, including creating LTS log groups and streams, SMN topics and log tanks, AOM alarm action rules, alarm group rules, and configuring alarm rules.
Related Resources/Data Sources
This best practice involves the following main resources:
Resources
Resource/Data Source Dependencies
Operation Steps
1. Script Preparation
Prepare the TF file (e.g., main.tf) in the specified workspace for writing the current best practice script, ensuring that it (or other TF files in the same directory) contains the provider version declaration and Huawei Cloud authentication information required for deploying resources. Refer to the "Preparation Before Deploying Huawei Cloud Resources" document for configuration introduction.
2. Create Log Tank Service Log Group Resource
Add the following script to the TF file (e.g., main.tf) to instruct Terraform to create a Log Tank Service log group resource:
Parameter Description:
group_name: The log group name, assigned by referencing the input variable lts_group_name
ttl_in_days: The log retention time (unit: days), set to 30 days
enterprise_project_id: The enterprise project ID, assigned by referencing the input variable enterprise_project_id, set to null when the value is an empty string
3. Create Log Tank Service Log Stream Resource
Add the following script to the TF file (e.g., main.tf) to instruct Terraform to create a Log Tank Service log stream resource:
Parameter Description:
group_id: The log group ID, referencing the ID of the previously created Log Tank Service log group resource (huaweicloud_lts_group.test)
stream_name: The log stream name, assigned by referencing the input variable lts_stream_name
enterprise_project_id: The enterprise project ID, assigned by referencing the input variable enterprise_project_id, set to null when the value is an empty string
4. Create Simple Message Notification Topic Resource
Add the following script to the TF file (e.g., main.tf) to instruct Terraform to create a Simple Message Notification topic resource:
Parameter Description:
name: The topic name, assigned by referencing the input variable smn_topic_name
enterprise_project_id: The enterprise project ID, assigned by referencing the input variable enterprise_project_id, set to null when the value is an empty string
5. Create Simple Message Notification Log Tank Resource
Add the following script to the TF file (e.g., main.tf) to instruct Terraform to create a Simple Message Notification log tank resource:
Parameter Description:
topic_urn: The topic URN, referencing the topic_urn of the previously created Simple Message Notification topic resource (huaweicloud_smn_topic.test)
log_group_id: The log group ID, referencing the ID of the previously created Log Tank Service log group resource (huaweicloud_lts_group.test)
log_stream_id: The log stream ID, referencing the ID of the previously created Log Tank Service log stream resource (huaweicloud_lts_stream.test)
6. Create AOM Alarm Action Rule Resource
Add the following script to the TF file (e.g., main.tf) to instruct Terraform to create an AOM alarm action rule resource:
Parameter Description:
name: The alarm action rule name, assigned by referencing the input variable alarm_action_rule_name, default value is "apm"
user_name: The user name, assigned by referencing the input variable alarm_action_rule_user_name
type: The alarm action rule type, assigned by referencing the input variable alarm_action_rule_type, default value is "1" (indicating notification type)
notification_template: The notification template name, using the built-in template "aom.built-in.template.zh"
smn_topics.topic_urn: The SMN topic URN, referencing the topic_urn of the previously created Simple Message Notification topic resource (huaweicloud_smn_topic.test)
7. Create AOM Alarm Group Rule Resource
Add the following script to the TF file (e.g., main.tf) to instruct Terraform to create an AOM alarm group rule resource:
Parameter Description:
depends_on: Explicit dependency relationship, ensuring the AOM alarm action rule resource is created before the alarm group rule resource
name: The alarm group rule name, assigned by referencing the input variable alarm_group_rule_name
group_by: The list of grouping fields, set to ["resource_provider"] (indicating grouping by resource provider)
group_interval: The group check interval (unit: seconds), assigned by referencing the input variable alarm_group_rule_group_interval, default value is 60 seconds
group_repeat_waiting: The group repeat waiting time (unit: seconds), assigned by referencing the input variable alarm_group_rule_group_repeat_waiting, default value is 3600 seconds
group_wait: The group wait time (unit: seconds), assigned by referencing the input variable alarm_group_rule_group_wait, default value is 15 seconds
description: The alarm group rule description, assigned by referencing the input variable alarm_group_rule_description, set to null when the value is an empty string
enterprise_project_id: The enterprise project ID, assigned by referencing the input variable enterprise_project_id, set to null when the value is an empty string
detail.bind_notification_rule_ids: The list of bound notification rule IDs, referencing the name of the previously created AOM alarm action rule resource (huaweicloud_aom_alarm_action_rule.test)
detail.match: The list of matching conditions, dynamically generated through the dynamic block based on the input variable alarm_group_rule_condition_matching_rules, default filters for Critical and Major severity alarms and alarms from AOM
8. Create AOM Alarm Rule Resource
Add the following script to the TF file (e.g., main.tf) to instruct Terraform to create an AOM alarm rule resource:
Parameter Description:
name: The alarm rule name, assigned by referencing the input variable alarm_rule_name
type: The alarm rule type, set to "metric" (indicating metric type)
enable: Whether to enable the alarm rule, set to true
prom_instance_id: The Prometheus instance ID, assigned by referencing the input variable prometheus_instance_id, default value is "0" (indicating the default Prometheus_AOM_Default instance)
alarm_notifications.notification_enable: Whether to enable notifications, set to true
alarm_notifications.notification_type: The notification type, set to "alarm_policy" (indicating alarm policy type)
alarm_notifications.route_group_enable: Whether to enable route grouping, set to true
alarm_notifications.route_group_rule: The route group rule name, referencing the name of the previously created AOM alarm group rule resource (huaweicloud_aom_alarm_group_rule.test)
alarm_notifications.notify_resolved: Whether to notify on recovery, set to true
alarm_notifications.notify_triggered: Whether to notify on trigger, set to true
alarm_notifications.notify_frequency: The notification frequency, set to "-1" (indicating using the alarm group rule's frequency settings)
metric_alarm_spec.monitor_type: The monitoring type, set to "all_metric" (indicating all metrics)
metric_alarm_spec.recovery_conditions.recovery_timeframe: The recovery time frame, set to 1 (unit: minutes)
metric_alarm_spec.trigger_conditions: The trigger conditions list, dynamically generated through the dynamic block based on the input variable alarm_rule_trigger_conditions
9. Preset Input Parameters Required for Resource Deployment (Optional)
In this practice, some resources use input variables to assign configuration content. These input parameters need to be manually entered during subsequent deployment. At the same time, Terraform provides a method to preset these configurations through tfvars files, which can avoid repeated input during each execution.
Create a terraform.tfvars file in the working directory with the following example content:
Usage:
Save the above content as a
terraform.tfvarsfile in the working directory (this filename allows users to automatically import the content of thistfvarsfile when executing terraform commands. For other naming, you need to add.autobefore tfvars, such asvariables.auto.tfvars)Modify parameter values according to actual needs
When executing
terraform planorterraform apply, Terraform will automatically read the variable values in this file
In addition to using the terraform.tfvars file, you can also set variable values in the following ways:
Command line parameters:
terraform apply -var="lts_group_name=test-group" -var="alarm_rule_name=test-rule"Environment variables:
export TF_VAR_lts_group_name=test-groupCustom named variable file:
terraform apply -var-file="custom.tfvars"
Note: If the same variable is set through multiple methods, Terraform will use variable values according to the following priority: command line parameters > variable file > environment variables > default values.
10. Initialize and Apply Terraform Configuration
After completing the above script configuration, execute the following steps to create resources:
Run
terraform initto initialize the environmentRun
terraform planto view the resource creation planAfter confirming that the resource plan is correct, run
terraform applyto start creating AOM prevent ELB alarm stormRun
terraform showto view the details of the created AOM prevent ELB alarm storm
Reference Information
Last updated