Lessons Learned When Building My DNS Resolver

Preface

I knew the Implement DNS in a weekend project by Julia Evans long ago and I always wanted to give it a try.

I mean, we all know something about DNS, at least on a high level, right? Yeah, right, probably. So I immediately dived into implementing after checking the first few pages of it. Looking back, this was a mistake (and I literally took almost a week to build from scratch).

Anyway, this post is about a guy who never did low level network programming, trying to build his own DNS resolver with the language he was on and off from time to time.

Some notes before we start:

  • It’s highly recommend to read Implement DNS in a weekend and then RFC 1035 and other resources if you are interested.

  • All diagrams, if not mentioned specifically, are borrowed from RFC 1035.

  • You can find my code at wtchangdm/tiny-resolver-rs repo.

    • Keep in mind it’s merely a toy resolver with incomplete support of standards.

    • There are tons of DNS related RFCs. I am having a huge respect to whom ever implemented these.

Prerequisite knowledge

  • An octet is a byte.

  • For a UDP message, the message limit is 512 octets at most.

  • A domain (e.g., "blog.wtcx.dev") should be less or equal to 255 octets. The way how length is calculated, see Domain representation and message compression.

  • A label, aka the components of the domain (e.g., "blog", "wtcx", or "dev"), is less or equal to 63 octets.

Big-endian, little-endian

2.3.2. Data Transmission Order

…Whenever an octet represents a numeric quantity, the left most bit in the diagram is the high order or most significant bit.

…Similarly, whenever a multi-octet field represents a numeric quantity the left most bit of the whole field is the most significant bit. When a multi-octet quantity is transmitted the most significant octet is transmitted first.

In network programming, data mostly transfered with Big-endian. That is, if the data consist of many octets, for example, like the TTL (unsigned 32 bit) in resource record, we need to make sure the most siginificant byte (MSB) is the leftmost one and so on.

Let’s see how is this different in action:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
fn main() {
    let n: u32 = 1;

    let n_ne = n.to_ne_bytes();
    let n_be = n.to_be_bytes();
    let n_le = n.to_le_bytes();
    print!(
        "n (native byte order): {:?}\nbig-endian: {:?}\nlittle-endian: {:?}\n",
        n_ne, n_be, n_le
    )
}

Output:

1
2
3
n (native byte order): [1, 0, 0, 0]
big-endian: [0, 0, 0, 1]
little-endian: [1, 0, 0, 0]

If we want to construct an u32 from response (in a buffer of [u8]):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
fn main() {
    // let's say we have a TTL of 1 second
    let buf: [u8; 4] = [0, 0, 0, 1];

    let n_ne = u32::from_ne_bytes(buf.try_into().unwrap());
    let n_be = u32::from_be_bytes(buf.try_into().unwrap());
    let n_le = u32::from_le_bytes(buf.try_into().unwrap());
    print!(
        "n (native byte order): {:?}\nbig-endian: {:?}\nlittle-endian: {:?}\n",
        n_ne, n_be, n_le
    )
}

Output:

1
2
3
n (native byte order): 16777216
big-endian: 1
little-endian: 16777216

Huge difference, right? So make sure you are building packets with .to_be_bytes() and reading with .from_be_bytes().

Message format

Before we can process the DNS message, we need to know the format.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
4.1. Format

+---------------------+
|        Header       |
+---------------------+
|       Question      | the question for the name server
+---------------------+
|        Answer       | RRs answering the question
+---------------------+
|      Authority      | RRs pointing toward an authority
+---------------------+
|      Additional     | RRs holding additional information
+---------------------+

For a query, we can just construct Header and Question sections. When we received a response, it will contain both of these fields as well, followed by optional resource records (RR) for Answer, Authority, and Additional.

The first part of message is Header (duh), which has fixed length of 64 bit so is relatively easy to parse.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
4.1.1. Header section format


                                1  1  1  1  1  1
  0  1  2  3  4  5  6  7  8  9  0  1  2  3  4  5
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                      ID                       |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|QR|   Opcode  |AA|TC|RD|RA|   Z    |   RCODE   |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                    QDCOUNT                    |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                    ANCOUNT                    |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                    NSCOUNT                    |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                    ARCOUNT                    |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
FieldLengthComment
ID16 bitUsed to validate question & response
QR1 bit0 for query, 1 for response
OPCODE4 bit0 for standard query, refer to RFC for others
AA1 bit1 = authoritative answer
TC1 bit1 = message is truncated
RD1 bit1 = recursion desired, I skipped this flag
Z3 bitReserved, I skipped as well
RA1 bit1 = recusion available, I…skipped as well
RCODE4 bitResponse code. 0 = no error, 4 = NXDOMAIN, check RFC for the rest
QDCOUNT16 bitHow many entries in Question section (after Header)
ANCOUNT16 bitHow many records in Answer section
NSCOUNT16 bitHow many records in Authority section
ARCOUNT16 bitHow many records in Additional section

This is the struct I used for Header.

1
2
3
4
5
6
7
8
pub struct MessageHeader {
    id: u16,
    flags: u16,
    qd_count: u16,
    an_count: u16,
    ns_count: u16,
    ar_count: u16,
}

It’s also easy to build a header for a standard query – we can just change ID value, leave other fields (u16) to 0 and call it a day.

Question

The Question section is the query’s payload. Parsing becomes slightly tricker because the length is not fixed.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
4.1.2. Question section format

                                1  1  1  1  1  1
  0  1  2  3  4  5  6  7  8  9  0  1  2  3  4  5
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                                               |
/                     QNAME                     /
/                                               /
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                     QTYPE                     |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                     QCLASS                    |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
FieldLengthComment
QNAMEvariousQNAME contains label sequence, no one knows how long this will be. But QNAME ends with a 0 octet so just keep parsing until you see it. This field is likely to be compressed.
QTYPE16 bitQTYPE is the type you want to query (e.g., 1 for A record), and is the superset of TYPE. See RFC 1035 Section 3.2.3.
QCLASS16 bitQCLASS is the class you want to qeury (mostly just 1 for IN, aka the internet). See RFC 1035 Section 3.2.5.

The struct:

1
2
3
4
5
pub struct MessageQuestion {
    domain: String,
    q_type: RecordType,
    q_class: RecordClass,
}

Where RecordType and RecordClass are just enum mapping u16 to corresponding variants.

I thought for a while whether I should use String for domain. It does make sense since we don’t know how long it will be, and it’s way easier to perform common operations on a String than &[u8]. The cost is additional allocation, though. But I guess performance is not the problem I need to deal with at the moment so String it is.

Extract flags in Header section

We know different flags in Header take different length, for example, we want to check the RCODE field to see if response is good.

The RCODE takes the last 4 bit of flags, and since we already parsed the whole flags as an u16, we need to ANDed it with decimal 15:

1
2
3
4
5
// some made up flags
let flags: u16 = 4;
let rcode = flags & 0b0000000000001111;
// or
let rcode = flags & 0x000F;

Let’s see it in binary form:

1
2
RCODE          0b0000000000000100
AND            0b0000000000001111

After this operation, we will get the last 4 bit of flags, which is the RCODE we wanted. And when RCODE is 4, it’s the NXDOMAIN that you know and probably don’t love.

Resource Record (RR)

The last three sections are Answer, Authority, and Additional. All three of them share the same format:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
4.1.3. Resource record format

                                1  1  1  1  1  1
  0  1  2  3  4  5  6  7  8  9  0  1  2  3  4  5
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                                               |
/                                               /
/                      NAME                     /
|                                               |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                      TYPE                     |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                     CLASS                     |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                      TTL                      |
|                                               |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                   RDLENGTH                    |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--|
/                     RDATA                     /
/                                               /
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
FieldLengthComment
NAMEvariousA domain name. like QNAME in Question.
TYPE16 bit1Type of the record (e.g., 1 for A record). See RFC 1035 Section 3.2.2.
CLASS16 bit1Class you of the record (mostly just 1 for IN, aka the internet). See RFC 1035 Section 3.2.4.
TTLunsigned 32 bitThe time the record can be cached in seconds. 0 = don’t cache.
RDLENGTHunsigned 16 bitHow long the RDATA fields is
RDATAvariousThe content varies from differet type of resource records. see RFC 1035 Section 3.3. Standard RRs.

The struct:

1
2
3
4
5
6
7
8
pub struct ResourceRecord {
    pub name: String,
    pub r_type: RecordType,
    pub r_class: RecordClass,
    pub ttl: u32,
    pub rd_length: u16,
    pub r_data: RecordData,
}

Domain representation and message compression

Every time you see <domain-name> in RFC, like QNAME in the Question section, or the RDATA field for resource records (RR), it can be compressed to save the payload size, without repeating data that has been transfered before.

You don’t necessarily need to compress the domain name before sending the query, but response will likely have compressed domain.

Uncompressed domain representation

Say, we want to query the address of "blog.wtcx.dev", we will encode the domain into a series of “length octet + label” format.

For example, "blog.wtcx.dev" consists of three labels, "blog", "wtcx", and "dev". We will need to first use 1 octet to indicate how long the following label is, and then followed by the actual label, and finally, we end the whole domain with a 0 octet.

Remember there is a 63 octets length limit for each label?

3.1. Name space definitions

…The high order two bits of every length octet must be zero, and the remaining six bits of the length field limit the label to 63 octets or less.

That is, for every length octet, we can only use lower 6 bit of the length octet, which is 0b00111111 in binary at most, and 63 in decimal.

The total domain length is calculated by combining all <label_length><label> pairs and don’t have the dots ('.') we see in human-readable format. Therefore, a domain representation for "blog.wtcx.dev" is like 4blog4wtcx3dev0, which is 15 octets long. And we can have at most 255 octets for a domain.

Compression format

Let’s borrow the diagram from RFC again:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
4.1.4. Message compression

   +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
20 |           1           |           F           |
   +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
22 |           3           |           I           |
   +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
24 |           S           |           I           |
   +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
26 |           4           |           A           |
   +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
28 |           R           |           P           |
   +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
30 |           A           |           0           |
   +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

   +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
40 |           3           |           F           |
   +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
42 |           O           |           O           |
   +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
44 | 1  1|                20                       |
   +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

   +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
64 | 1  1|                26                       |
   +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

   +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
92 |           0           |                       |
   +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

From the 20th (3) to the 31th octet (0), that’s percisely the uncompressed domain representation above.

Starting with 40th (3) ~ 43th (O, English letter, not zero) shares the same logic like uncompressed format. And then, the two higher bit of the 44th octet has 11 in it.

Remember the limit where a label can only be 63 bit long? That limit leaves the higher two bits for other purpose.

  • When you see a length octet, like 0b00111111, the first two bits are 00, means it’s uncompressed, normal label.

  • When the first two bits are 11, like you saw on the 44th octet above, means it’s a compression format followed by a pointer, and we will have to “jump” to the position that lower 14 bit points to. Check domain name compression and the pointer for more details.

  • First two bits that’s 10 or 01 are reserved for future use. I didn’t look into that since it’s not required at this moment.

Check domain name compression and where the pointer points to

We know compressed message has a 2-octet group, the higher 2 bit is 11, indicating this is the following domain is compressed, and the position we need to “jump to”, is calculated by the lower 14 bit.

So how do we check whether it’s compressed, and where is the posistion? Luckily, we can reuse AND operation in the flag example.

I normally parse domain names octet by octet because that works well for both uncompressed and compressed domain.

Say, we are parsing the length octet (u8) and see 0b11000010.

We check the first two bits by ANDing a u8 0b11000000 (or a 0xC0 in hex), that way everything other than the first 2 bits will be set to 0.

1
2
               0b11000010
AND            0b11000000

and we have the result 0b11000000.

The actual code is self-explanatory as well:

1
2
3
4
let len = buf[curr_pos] as usize;
// 0xC0 = 0b11000000
// Check the two bits of the pointer are "11".
let is_compressed = len & 0xC0 == 0xC0;

If it’s not compressed, we continue to parse octet by octet until we see 0 oecet.

Otherwise, it’s compressed, and we have to combine the remaining 6 bits with the following 8 bits to calculate the pointer.

Let’s imagine the two octets look like:

1
0b1100000000010100

So, we combine the two octets first with:

1
u16::from_be_bytes([buf[curr_pos], buf[curr_pos + 1]]

and then we get 0b1100000000010100.

In order to set the first two bits to 0, we will AND the u16 with 0b0011111111111111 (or 0x3FFF in hex).

1
2
       0b1100000000010100
AND    0b0011111111111111
1
let offset = (u16::from_be_bytes([buf[curr_pos], buf[curr_pos + 1]]) & 0x3FFF) as usize;

In this case, we get 20. So we can “jump” to 20th octet and continue to parse the domain. If you are not sure where were we before the “jump”, check the 44th octet of compression format diagram.

You can check my (& ChatGPT’s) attempt on the domain name parsing on GitHub.

Binding 127.0.0.1 won’t work

When sending DNS query message, we first need to bind to a UDP socket on the host and then send packets with it.

At first I thought: I don’t want to bind on every interface, so maybe just 127.0.0.1, and lets OS pick a random port for it. So:

1
2
3
4
5
6
7
// ...
let socket = UdpSocket::bind("127.0.0.1:0").map_err(Error::NetworkError)?;
println!("socket binding OK!");
let bytes_sent = socket
    .send_to(&query.to_query_bytes(), addr)
    .map_err(Error::NetworkError)?;
// ...

And then:

1
2
3
4
Looking up blog.wtcx.dev
socket binding OK!
thread 'main' panicked at src/main.rs:6:47:
called `Result::unwrap()` on an `Err` value: NetworkError(Os { code: 49, kind: AddrNotAvailable, message: "Can't assign requested address" })

Seems like we can bind on 127.0.0.1 without problems. But we can’t send packet to remote address, why? That’s because the remote peer can’t send their response to 127.0.0.1, this is the loopback address for them as well.

However, if we change the remote addr to 127.0.0.1:53, then the packet goes through!

1
2
3
4
let bytes_sent = socket
    // WARNING: Of course we don't want to do this. This is just to check that loopback addr <-> loopback addr is possible.
    .send_to(&query.to_query_bytes(), "127.0.0.1:53")
    .map_err(Error::NetworkError)?;

But the very first step of a DNS resolver (if no cache) is to ask root name servers, right? So let’s just bind 0.0.0.0:0 and be done with it.

1
2
3
4
5
6
7
// ...
let socket = UdpSocket::bind("0.0.0.0:0").map_err(Error::NetworkError)?;
println!("socket binding OK!");
let bytes_sent = socket
    .send_to(&query.to_query_bytes(), addr)
    .map_err(Error::NetworkError)?;
// ...

See this Stack Overflow querstion if you are interested.

Unexpected IPv6 ambush

Just when I thought that’s pretty much it, right? And started my test. And then came the twist.

My resolver crashed during the parsing of resource record. It complained about unknown record type 28. I went back to search RFC and couldn’t find type 28 anywhere. After checking with ChatGPT I then learned that’s in RFC 3596, the IPv6 extension.

So even though I thought I don’t have to deal with IPv6 any time soon, it immediately striked back.

Good thing is all I need to do is just update the RecordType enum and an arm in the match expression.

Improvements worth considering

Because this is just a toy DNS resolver, it doesn’t contain extensive tests and doesn’t have most of the standards supported. There are still things I want to to in the future (if possible):

Caching and negative caching

RFC specifically said resovler need to cache the result whenever possible (but respect to TTL)! So we don’t always bother root nameservers and the authoritative name servers all the time.

IPv6

This is obvious right? Supporting IPv6 is probably the one of the most important things for a resolver.

More protocols

There are many protocols can be used for DNS queries other than UDP nowadays. TCP, DNS over TLS, or DNS over HTTPS. I don’t know the difficulity yet, but let’s put it on a wishlist.

Async operation

I don’t want to use the word “obvious” again but you get the idea. Resolver with sync operation probably only works for a toy CLI program.

Internationalized domain names (IDN)

Internationalized domain name (e.g., 中文.tw) is something probably not popular but should be fun to do. (maybe?)

Final thoughts

I mentioned that I always wanted to make a DNS resolver on my own, and I finally did. But implementing without checking how everyone else did it can be both good and bad thing.

Good thing is I really delved into RFC and trying to figure out where was I, it was eye-opening to see how stuff underneath really works. I would recommend this to anyone who’s interested!

Bad thing is that I spent way, way more time than I expected. Most of the time I was just repeatedly looking at same paragraphs and trying to figure out what does this document mean, while I can definitely check some examples (even asking AI). Also, there are actually tons of RFCs and I only focused on RFS 1035 in order to make a working PoC.

I also strugged a while on several places, like bitwise operation and finding what’s TYPE 28. These looked so obvious in hindsight but really took me for some time then I just turned to ChatGPT.

My development at work mostly related to high level programming like integrating RESTful API and stuff. As an infrasture engineer, I have tons of debugging tools in hand. There is rarely a chance to build a tool working on low level myself because… I just don’t need to, like everyone else.

It was also a fun experience and a good project to practice Rust (unlike the abandoned advent of code).

Some observations during the making of a DNS resolver:

  • There are extensively length checks than I thought (I guess it’s still not enough)
  • Malformed pointer in compresstion format can really mess resolver up as it can be infinitely recursive. It’s resolver’s responsibility to not being naive.
  • I am not a security expert. But when I saw I got a ~500 bytes response from a ~30 bytes query. That really gives me a concept of how dangerous DNS amplification attack can be!
  • DNS Caching is very important (especially in Kubernetes).

Further readings


  1. RFC 1035 didn’t specify whether it’s signed or unsigned 16 bit. I am using u16 though. ↩︎ ↩︎

0%