Lessons Learned When Building My DNS Resolver
Preface
I knew the Implement DNS in a weekend project by Julia Evans long ago and I always wanted to give it a try.
I mean, we all know something about DNS, at least on a high level, right? Yeah, right, probably. So I immediately dived into implementing after checking the first few pages of it. Looking back, this was a mistake (and I literally took almost a week to build from scratch).
Anyway, this post is about a guy who never did low level network programming, trying to build his own DNS resolver with the language he was on and off from time to time.
Some notes before we start:
It’s highly recommend to read Implement DNS in a weekend and then RFC 1035 and other resources if you are interested.
All diagrams, if not mentioned specifically, are borrowed from RFC 1035.
You can find my code at wtchangdm/tiny-resolver-rs repo.
Keep in mind it’s merely a toy resolver with incomplete support of standards.
There are tons of DNS related RFCs. I am having a huge respect to whom ever implemented these.
Prerequisite knowledge
An
octet
is abyte
.For a UDP message, the message limit is
512 octets
at most.A domain (e.g.,
"blog.wtcx.dev"
) should be less or equal to255 octets
. The way how length is calculated, see Domain representation and message compression.A label, aka the components of the domain (e.g.,
"blog"
,"wtcx"
, or"dev"
), is less or equal to63 octets
.
Big-endian, little-endian
2.3.2. Data Transmission Order
…Whenever an octet represents a numeric quantity, the left most bit in the diagram is the high order or most significant bit.
…Similarly, whenever a multi-octet field represents a numeric quantity the left most bit of the whole field is the most significant bit. When a multi-octet quantity is transmitted the most significant octet is transmitted first.
In network programming, data mostly transfered with Big-endian. That is, if the data consist of many octets, for example, like the TTL
(unsigned 32 bit) in resource record, we need to make sure the most siginificant byte (MSB) is the leftmost one and so on.
Let’s see how is this different in action:
|
|
Output:
|
|
If we want to construct an u32
from response (in a buffer of [u8]
):
|
|
Output:
|
|
Huge difference, right? So make sure you are building packets with .to_be_bytes()
and reading with .from_be_bytes()
.
Message format
Before we can process the DNS message, we need to know the format.
|
|
For a query, we can just construct Header
and Question
sections. When we received a response, it will contain both of these fields as well, followed by optional resource records (RR) for Answer
, Authority
, and Additional
.
Header
The first part of message is Header
(duh), which has fixed length of 64 bit so is relatively easy to parse.
|
|
Field | Length | Comment |
---|---|---|
ID | 16 bit | Used to validate question & response |
QR | 1 bit | 0 for query, 1 for response |
OPCODE | 4 bit | 0 for standard query, refer to RFC for others |
AA | 1 bit | 1 = authoritative answer |
TC | 1 bit | 1 = message is truncated |
RD | 1 bit | 1 = recursion desired, I skipped this flag |
Z | 3 bit | Reserved, I skipped as well |
RA | 1 bit | 1 = recusion available, I…skipped as well |
RCODE | 4 bit | Response code. 0 = no error, 4 = NXDOMAIN, check RFC for the rest |
QDCOUNT | 16 bit | How many entries in Question section (after Header ) |
ANCOUNT | 16 bit | How many records in Answer section |
NSCOUNT | 16 bit | How many records in Authority section |
ARCOUNT | 16 bit | How many records in Additional section |
This is the struct I used for Header
.
|
|
It’s also easy to build a header for a standard query – we can just change ID
value, leave other fields (u16
) to 0
and call it a day.
Question
The Question
section is the query’s payload. Parsing becomes slightly tricker because the length is not fixed.
|
|
Field | Length | Comment |
---|---|---|
QNAME | various | QNAME contains label sequence, no one knows how long this will be. But QNAME ends with a 0 octet so just keep parsing until you see it. This field is likely to be compressed. |
QTYPE | 16 bit | QTYPE is the type you want to query (e.g., 1 for A record), and is the superset of TYPE . See RFC 1035 Section 3.2.3. |
QCLASS | 16 bit | QCLASS is the class you want to qeury (mostly just 1 for IN , aka the internet). See RFC 1035 Section 3.2.5. |
The struct:
|
|
Where RecordType
and RecordClass
are just enum mapping u16
to corresponding variants.
I thought for a while whether I should use String
for domain. It does make sense since we don’t know how long it will be, and it’s way easier to perform common operations on a String
than &[u8]
. The cost is additional allocation, though. But I guess performance is not the problem I need to deal with at the moment so String
it is.
Extract flags in Header
section
We know different flags in Header
take different length, for example, we want to check the RCODE
field to see if response is good.
The RCODE
takes the last 4 bit of flags, and since we already parsed the whole flags as an u16
, we need to AND
ed it with decimal 15
:
|
|
Let’s see it in binary form:
|
|
After this operation, we will get the last 4 bit of flags, which is the RCODE
we wanted. And when RCODE
is 4
, it’s the NXDOMAIN
that you know and probably don’t love.
Resource Record (RR)
The last three sections are Answer
, Authority
, and Additional
. All three of them share the same format:
|
|
Field | Length | Comment |
---|---|---|
NAME | various | A domain name. like QNAME in Question . |
TYPE | 16 bit1 | Type of the record (e.g., 1 for A record). See RFC 1035 Section 3.2.2. |
CLASS | 16 bit1 | Class you of the record (mostly just 1 for IN , aka the internet). See RFC 1035 Section 3.2.4. |
TTL | unsigned 32 bit | The time the record can be cached in seconds. 0 = don’t cache. |
RDLENGTH | unsigned 16 bit | How long the RDATA fields is |
RDATA | various | The content varies from differet type of resource records. see RFC 1035 Section 3.3. Standard RRs. |
The struct:
|
|
Domain representation and message compression
Every time you see <domain-name>
in RFC, like QNAME
in the Question
section, or the RDATA
field for resource records (RR), it can be compressed to save the payload size, without repeating data that has been transfered before.
You don’t necessarily need to compress the domain name before sending the query, but response will likely have compressed domain.
Uncompressed domain representation
Say, we want to query the address of "blog.wtcx.dev"
, we will encode the domain into a series of “length octet + label” format.
For example, "blog.wtcx.dev"
consists of three labels, "blog"
, "wtcx"
, and "dev"
. We will need to first use 1 octet
to indicate how long the following label is, and then followed by the actual label, and finally, we end the whole domain with a 0 octet
.
Remember there is a 63 octets
length limit for each label?
3.1. Name space definitions
…The high order two bits of every length octet must be zero, and the remaining six bits of the length field limit the label to 63 octets or less.
That is, for every length octet, we can only use lower 6 bit of the length octet, which is 0b00111111
in binary at most, and 63
in decimal.
The total domain length is calculated by combining all <label_length><label>
pairs and don’t have the dots ('.'
) we see in human-readable format. Therefore, a domain representation for "blog.wtcx.dev"
is like 4blog4wtcx3dev0
, which is 15 octets
long. And we can have at most 255 octets
for a domain.
Compression format
Let’s borrow the diagram from RFC again:
|
|
From the 20th (3
) to the 31th octet (0
), that’s percisely the uncompressed domain representation above.
Starting with 40th (3
) ~ 43th (O
, English letter, not zero) shares the same logic like uncompressed format. And then, the two higher bit of the 44th octet has 11
in it.
Remember the limit where a label can only be 63 bit long? That limit leaves the higher two bits for other purpose.
When you see a length octet, like
0b00111111
, the first two bits are00
, means it’s uncompressed, normal label.When the first two bits are
11
, like you saw on the 44th octet above, means it’s a compression format followed by a pointer, and we will have to “jump” to the position that lower 14 bit points to. Check domain name compression and the pointer for more details.First two bits that’s
10
or01
are reserved for future use. I didn’t look into that since it’s not required at this moment.
Check domain name compression and where the pointer points to
We know compressed message has a 2-octet group, the higher 2 bit is 11
, indicating this is the following domain is compressed, and the position we need to “jump to”, is calculated by the lower 14 bit.
So how do we check whether it’s compressed, and where is the posistion? Luckily, we can reuse AND
operation in the flag example.
I normally parse domain names octet by octet because that works well for both uncompressed and compressed domain.
Say, we are parsing the length octet (u8
) and see 0b11000010
.
We check the first two bits by AND
ing a u8
0b11000000
(or a 0xC0
in hex), that way everything other than the first 2 bits will be set to 0.
|
|
and we have the result 0b11000000
.
The actual code is self-explanatory as well:
|
|
If it’s not compressed, we continue to parse octet by octet until we see 0
oecet.
Otherwise, it’s compressed, and we have to combine the remaining 6 bits with the following 8 bits to calculate the pointer.
Let’s imagine the two octets look like:
|
|
So, we combine the two octets first with:
|
|
and then we get 0b1100000000010100
.
In order to set the first two bits to 0
, we will AND
the u16
with 0b0011111111111111
(or 0x3FFF
in hex).
|
|
|
|
In this case, we get 20
. So we can “jump” to 20th octet and continue to parse the domain. If you are not sure where were we before the “jump”, check the 44th octet of compression format diagram.
You can check my (& ChatGPT’s) attempt on the domain name parsing on GitHub.
Binding 127.0.0.1
won’t work
When sending DNS query message, we first need to bind to a UDP socket on the host and then send packets with it.
At first I thought: I don’t want to bind on every interface, so maybe just 127.0.0.1
, and lets OS pick a random port for it. So:
|
|
And then:
|
|
Seems like we can bind on 127.0.0.1
without problems. But we can’t send packet to remote address, why? That’s because the remote peer can’t send their response to 127.0.0.1
, this is the loopback address for them as well.
However, if we change the remote addr to 127.0.0.1:53
, then the packet goes through!
|
|
But the very first step of a DNS resolver (if no cache) is to ask root name servers, right? So let’s just bind 0.0.0.0:0
and be done with it.
|
|
See this Stack Overflow querstion if you are interested.
Unexpected IPv6 ambush
Just when I thought that’s pretty much it, right? And started my test. And then came the twist.
My resolver crashed during the parsing of resource record. It complained about unknown record type 28
. I went back to search RFC and couldn’t find type 28
anywhere. After checking with ChatGPT I then learned that’s in RFC 3596, the IPv6 extension.
So even though I thought I don’t have to deal with IPv6 any time soon, it immediately striked back.
Good thing is all I need to do is just update the RecordType
enum and an arm in the match expression.
Improvements worth considering
Because this is just a toy DNS resolver, it doesn’t contain extensive tests and doesn’t have most of the standards supported. There are still things I want to to in the future (if possible):
Caching and negative caching
RFC specifically said resovler need to cache the result whenever possible (but respect to TTL
)! So we don’t always bother root nameservers and the authoritative name servers all the time.
IPv6
This is obvious right? Supporting IPv6 is probably the one of the most important things for a resolver.
More protocols
There are many protocols can be used for DNS queries other than UDP
nowadays. TCP
, DNS over TLS
, or DNS over HTTPS
. I don’t know the difficulity yet, but let’s put it on a wishlist.
Async operation
I don’t want to use the word “obvious” again but you get the idea. Resolver with sync operation probably only works for a toy CLI program.
Internationalized domain names (IDN)
Internationalized domain name (e.g., 中文.tw) is something probably not popular but should be fun to do. (maybe?)
Final thoughts
I mentioned that I always wanted to make a DNS resolver on my own, and I finally did. But implementing without checking how everyone else did it can be both good and bad thing.
Good thing is I really delved into RFC and trying to figure out where was I, it was eye-opening to see how stuff underneath really works. I would recommend this to anyone who’s interested!
Bad thing is that I spent way, way more time than I expected. Most of the time I was just repeatedly looking at same paragraphs and trying to figure out what does this document mean, while I can definitely check some examples (even asking AI). Also, there are actually tons of RFCs and I only focused on RFS 1035 in order to make a working PoC.
I also strugged a while on several places, like bitwise operation and finding what’s TYPE
28. These looked so obvious in hindsight but really took me for some time then I just turned to ChatGPT.
My development at work mostly related to high level programming like integrating RESTful API and stuff. As an infrasture engineer, I have tons of debugging tools in hand. There is rarely a chance to build a tool working on low level myself because… I just don’t need to, like everyone else.
It was also a fun experience and a good project to practice Rust (unlike the abandoned advent of code).
Some observations during the making of a DNS resolver:
- There are extensively length checks than I thought (I guess it’s still not enough)
- Malformed pointer in compresstion format can really mess resolver up as it can be infinitely recursive. It’s resolver’s responsibility to not being naive.
- I am not a security expert. But when I saw I got a
~500 bytes
response from a~30 bytes
query. That really gives me a concept of how dangerous DNS amplification attack can be! - DNS Caching is very important (especially in Kubernetes).