Query Stub Domains with CoreDNS and NodeLocal DNSCache
Recently, We started migrating our EKS 1.16 clusters to brand new 1.19 ones.
Of all the changes, I am most excited about NodeLocal DNSCache. It not only greatly reduced the time of DNS query latency (mostly because of these serach domains), but issues like conntrack races at very little expense.
The hidden problem
Networking is hard, especially in Kubernetes. DNS is often one of the problems.
Normally, no matter how many CoreDNS pods you run, there are always chances your DNS queries fly
over your head across instances, or even worse, availability zones. This increases the latency, failure possibility and lowers the app’s performance that needs high throughput.
For instance, applications like log shipper and push notification workers tend to send many requests in short periods and will only increase when under pressure. If you check CoreDNS’ logs, you will notice there are so many
NXDOMAIN results because of the search domains. Although these results are getting cached, the time wasted on traveling between nodes and zones is still expensive.
ndots for cluster-level log shippers, but it’s unlikely and impractical to ask everyone else to do the same.
Install the cluster addon
If I remember correctly, GKE has a simple checkbox to install NodeLocal DNSCache. But since we are using AWS, we seem not to deserve the option. Anyway, it’s still pretty easy to install if you follow the instructions.
It creates a daemonset, a configmap, a service account for the
node-local-dns and services to expose itself and the upstream CoreDNS. You don’t need to change any of your workloads to benefit from this. This daemonset will manipulate iptables rules (in a good way) to intercept DNS queries to the CoreDNS service cluster IP and check if it has the cache.
It feels great, just like you were using VMs and never thought you would be worrying about DNS queries being too slow one day.
Wait, I can’t connect to the services with stub domain?
There is always a “but”. We have some “special” domains that need to set up a stub domain block in Corefile to make it work. Like:
It always worked before, and I naively thought it would continue to work after I installed NodeLocal DNSCache, until a colleague told me that he couldn’t connect to a certain database from the new cluster.
When something can’t be connected, I usually check DNS first and then use tools like
nc later. And since we are all in this article, of course, it’s a DNS issue.
I was dumbfounded and wondering how it is possible. If cache misses, then it will still ask upstream CoreDNS, won’t it?
I removed NodeLocal DNSCache, then found the stub domain resolves. I applied the NodeLocal DNSCache again, no result. After that, I started searching around and hoping my mentors Google and StackOverflow would shed some light.
There was a commit that added support for
kube-dns. I immediately checked daemonset manifest and found there is indeed an optional volume mounting the
kubedns configmap. I tried to change the configmap to
coredns and hope it will work.
Why? Because the stub domains format is different between
kube-dns and CoreDNS. As you can see in CoreDNS configuration equivalent to kube-dns document:
This is what
stubDomains looks like. And the CoreDNS one is already presented above. Of course it doesn’t work.
Take a good hard look at the configmap
I’ve searched a few pages on Google but haven’t thought that answer can be in the manifest all along.
After all these
sed substitutions, the configmap in manifest will look like:
So, when it comes to zone
ip6.arpa, go ask
__PILLAR__CLUSTER__DNS__ will be replaced at runtime with
c.clusterDNSIP. What is
c.clusterDNSIP you say?
According to the flag parsing, it’s
upstreamsvc flag that defaults to
Let’s go back to the manifest, we will find:
…and the service snippet:
__PILLAR__CLUSTER__DNS__ is an “alternative” upstream CoreDNS ClusterIP service, since the original IP of CoreDNS service will be intercepted (
172.20.0.10 in this case). This will create another ClusterIP service for NodeLocal DNSCache to contact with upstream. Therefore, when it comes to zone
ip6.arpa, go ask upstream CoreDNS.
We now understand that stub domains are not included in the zones above, so it will ask
According to this comfigmap.go and the zero value of UpstreamNameservers,
__PILLAR__UPSTREAM__SERVERS__ will be replaced with
/etc/resolv.conf at runtime.
Let’s look back to the manifest again:
We can get explanations about
dnsPolicy: Default from the Pod’s DNS Policy document, that:
“Default”: The Pod inherits the name resolution configuration from the node that the pods run on. See related discussion for more details.
What nameserver will present in node’s
/etc/resolv.conf? Yes, the reserved IP address for DNS in VPC, which may be something like
Fallback to upstream CoreDNS, not Route53
After all that troubles, we can just replace
If you are not using stub domains, then using Route53 won’t cause any problems.
However, in our case, we should fallback to upstream CoreDNS instead.
- DNS Lookups in Kubernetes
- Using NodeLocal DNSCache in Kubernetes clusters
- CoreDNS configuration equivalent to kube-dns
- Pod’s DNS Policy
- VPC and subnet sizing for IPv4