Someone's life journey.

Query Stub Domains with CoreDNS and NodeLocal DNSCache

W.T. Chang published on 2021-04-28 included in dev ops

Recently, We started migrating our EKS 1.16 clusters to brand new 1.19 ones.

Of all the changes, I am most excited about NodeLocal DNSCache. It not only greatly reduced the time of DNS query latency (mostly because of these serach domains), but issues like conntrack races at very little expense.

Use Default Backend Service on ingress-nginx

W.T. Chang published on 2021-04-18 included in dev ops

We use ingress-nginx as our Ingress Controller.

Due to business requirements, we have lots of domains to handle. It’s unrealistic to create more than one thousand ingress resources, which adds unnecessary loads to control plane and is difficult to maintain and update all of them.

Luckily, ingress-nginx provides a default backend service to handle this situation.

Check TLS Cert Expiry With Node's TLS Module

W.T. Chang published on 2020-08-21 included in ops

There are news about some services forget to update TLS certificates from time to time.

It’s even more critical when you are hosting many sites with bunch of domains. Monitoring this is not difficult, it’s often caused by negligence.

Automatically Recover EC2 Instances That Failing Status Checks With Cloudwatch Events and Lambda

W.T. Chang published on 2020-06-14 included in ops

The Incident

Recently, some of our EKS worker nodes suddenly became unresponsive. When I was checking on the EC2 console, the status check showed “Insufficient Data”.

According to past experience, when underlying hardware somehow got impaired, we will get notifications. However, without much useful information this time, I only did some quick investigation and then had to manually terminate these instances.

If I Had Known This Sooner: $_

W.T. Chang published on 2020-05-16 included in ops If I Had Known This Sooner

There are always times that I learned something useful and/or cool which I wish I had known sooner. So, I decided to write these little things down. Maybe these kind of things have the protential to be a series.

Prevent AWS CLI V2 From Using Pager

W.T. Chang published on 2020-02-23 included in ops

I always like to try new things out. So when I saw AWS CLI v2 is generally available, I just upgraded it without checking changes.

One thing I found different after the upgrade is that, seems all outputs are redirected to things like less. It’s somehow inconvenient when calling AWS CLI in a shell script:

List (Almost) Everything Inside a Namespace of a Kubernetes Cluster

W.T. Chang published on 2020-02-17 included in ops

One day, I was preparing to remove a no longer needed namespace of one Kubernetes cluster. Before I did that, I checked what’s inside again to be sure.

So I typed $ kubectl get all and see if I missed something. It turned out that several things were not listed in the output, like the ingresses I was expecting.

Notify Google to Update Sitemap Using Netlify Functions

W.T. Chang published on 2020-02-09 included in dev

Currently, this site is hosted on Netlify. I am pretty satisfied and don’t plan to move anytime soon. I also submitted my sitemap to Google for it to index. But the update frequency seems not very high.

Fortunately, Google provides an endpoint for you to notify it. Send a GET request to http://www.google.com/ping?sitemap=${siteMapUrl} and you are done.

But do we have to use curl every time we deploy to tell Google it’s time to fetch our sitemap? Well, life is short, don’t waste time on things like that.

Disable T3 Unlimited Using ASG Lifecycle Hooks, CloudWatch Events, and Lambda

W.T. Chang published on 2020-02-08 included in dev

Preface

To save costs on testing environments, we use multuple instance types with 100% Spot ratio and lowest-price allocation strategy for several auto scaling groups.

We combined several instance types like c5.xlarge, m5.xlarge, t3.xlarge, t3a.xlarge. It works fine so far, but t3 and t3a instances come with unlimited credits enabled by default. If applications run on these instances suddenly start misbehaving, the cost will increase after accumulated credits burn out.

The Long Way to Windows Container on Amazon EKS: Node Affinity

W.T. Chang published on 2020-01-04 included in ops

After dealing with the vpc-resource-controller, I can finally see the IIS page. But a running sample does not mean anything. So I wrapped a few deployment YAML up to see if our workloads work.