Disable T3 Unlimited Using ASG Lifecycle Hooks, CloudWatch Events, and Lambda

Preface

To save costs on testing environments, we use multuple instance types with 100% Spot ratio and lowest-price allocation strategy for several auto scaling groups.

We combined several instance types like c5.xlarge, m5.xlarge, t3.xlarge, t3a.xlarge. It works fine so far, but t3 and t3a instances come with unlimited credits enabled by default. If applications run on these instances suddenly start misbehaving, the cost will increase after accumulated credits burn out.

…In the cases that the T3 instances needs to run at higher CPU utilization for a prolonged period, it can do so for a small additional charge of $0.05 per vCPU-hour. – UNLIMITED AND STANDARD MODE

This is the statement from the T3 instance introduction page. I am not sure whether $0.05 applies to Spot instance, or lower. But one thing is for sure, we don’t want to be additionally charged in this case. If your workload requires consistent performance, you shouldn’t use T-series instances in the first place.

How about disabling T2/T3 Unlimited in Launch Template?

That’s a good question. However, it just won’t do at the time of this writing. If a non-T-series is launched (remember we have set different instance families above?), it will simply fail because of incompatible settings.

Combining ASG Lifecycle Hooks, CloudWatch Events, and Lambda

So, we have to make some efforts to keep money in our wallet. We will set lifecycle hook in auto scaling groups, then use Serverless Framework to deploy our Lambda and connect it to CloudWatch Events.

Auto Scaling Group Lifecycle Hooks

Before you can receive notifications from CloudWatch Events, you have to create lifecycle hook first.

Just go to the ASG you want to set and click Create Lifecycle Hook, fill some fields out:

  • Lifecycle Transition: Since we want the instance launch event, choose Instance Launch here.
  • Heartbeat Timeout: You can use a smaller value if you like.
  • Default Result: Choose CONTINUE here in case the lambda doesn’t work for some reason.

P.S. For CLI version, use aws autoscaling put-lifecycle-hook.

CloudWatch Events and Lambda Integration

As mentioned above, we will use Serverless Framework to do the work. First, we need a Role for Lambda to use.

Permissions for the Lambda Role

  • You should attach AWSLambdaBasicExecutionRole for logging.
  • Add this inline policy for least permissions.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "VisualEditor0",
                "Effect": "Allow",
                "Action": [
                    "ec2:DescribeInstanceAttribute",
                    "ec2:ModifyInstanceCreditSpecification",
                    "autoscaling:CompleteLifecycleAction"
                ],
                "Resource": "*"
            }
        ]
    }

BTW, you can create Lambda Role in the following serverless.yml for sure. It all depends on how you like to manage permissions.

serverless.yml

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
service: <YOUR_SERVICE_NAME>
provider:
  name: aws
  runtime: nodejs12.x
  memorySize: 128
  region: <REGION>
  deploymentBucket:
    name: <THE_BUCKET_TO_PUT_ARTIFACT>
  logRetentionInDays: 1 # Don't set too long. Log storage costs too.
functions:
  disableUnlimited:
    handler: disableUnlimited.handler # Format: FILE_NAME.FUNCTION_NAME
    role: <YOUR_LAMBDA_ROLE_ARN_HERE>
    events:
      - cloudwatchEvent:
          enabled: true
          event:
            source:
              - 'aws.autoscaling'
            detail-type:
              - 'EC2 Instance-launch Lifecycle Action'
            detail:
              AutoScalingGroupName:
                - mock-auto-scaling-group-1
                - mock-auto-scaling-group-2

This will create a CloudFormation stack containing CloudWach Events, Lambda Function, and a Log Group.

samplePayload.json

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
{
  "version": "0",
  "id": "2c222438-8452-3e56-b13e-3b40a11657df",
  "detail-type": "EC2 Instance-launch Lifecycle Action",
  "source": "aws.autoscaling",
  "account": "<ACCOUNT_ID>",
  "time": "2020-02-08T08:42:04Z",
  "region": "ap-northeast-1",
  "resources": [
    "arn:aws:autoscaling:ap-northeast-1:<ACCOUNT_ID>:autoScalingGroup:12345b00-3c66-499a-9b16-99c6d95bcdef:autoScalingGroupName/<AUTO_SCALING_GROUP_NAME>"
  ],
  "detail": {
    "LifecycleActionToken": "ff4dd377-c1e0-48ac-128d-940cdbda9abc",
    "AutoScalingGroupName": "<AUTO_SCALING_GROUP_NAME>",
    "LifecycleHookName": "Disable-Unlimited-Credit-At-Launch",
    "EC2InstanceId": "i-01ad4d8fac123acf2",
    "LifecycleTransition": "autoscaling:EC2_INSTANCE_LAUNCHING"
  }
}

This file is just to give you an idea of how an event pass to Lambda looks like. For more information, see the links at the bottom.

disableUnlimited.js

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
const AWS = require('aws-sdk')
const ec2 = new AWS.EC2()
const autoscaling = new AWS.AutoScaling()

module.exports.handler = async event => {
  const {
    AutoScalingGroupName,
    LifecycleHookName,
    LifecycleActionToken,
    EC2InstanceId: InstanceId
  } = event.detail
  // Here we make the lifecycle action to continue no matter what.
  // Modify it if needed.
  const LifecycleActionResult = 'CONTINUE' // Or ABANDON

  try {
    // Since we only have instance id, we have to ask EC2 service about its instance type.
    const { InstanceType } = await ec2.describeInstanceAttribute({ Attribute: 'instanceType', InstanceId }).promise()
    // If it's a T-series instance, disable the unlimited option.
    // You can set the string to "t3" to be more precise. T2 instances don't have unlimited enabled by default.
    const isBurstable = InstanceType.Value.startsWith('t')
    if (isBurstable) {
      await ec2.modifyInstanceCreditSpecification({
          InstanceCreditSpecifications: [
            { InstanceId, CpuCredits: 'standard' }
          ]
        }).promise()
    }
  } catch (error) {
    console.error(error)
  }

  return autoscaling.completeLifecycleAction({
    AutoScalingGroupName,
    LifecycleHookName,
    LifecycleActionResult,
    LifecycleActionToken
   }).promise()
}

Deployment

Let’s deploy it. Make sure you have Serverless Framework installed.

1
$ sls deploy # Yeah, that's it.

After deployment finished, try to scale 1 more T-series instance to see if it works.

Further Readings

0%