Skip to main content
  1. Posts/

Lessons Learned When Building My DNS Resolver

·3288 words·16 mins·
Dev Networking DNS Rust
Table of Contents

I knew about the Implement DNS in a weekend project by Julia Evans long ago and I always wanted to give it a try.

I mean, we all know something about DNS, at least at a high level, right? Yeah, right, probably. So I immediately delved into the implementation after checking the first few pages of it. Looking back, this was a mistake (and I took almost a week to build from scratch).

Anyway, this post is about a guy who never did low-level network programming, trying to build his own DNS resolver with the language he was on and off from time to time.

Some notes before we start:

  • It’s highly recommended to read Implement DNS in a weekend and then RFC 1035 and other resources if you are interested.

  • All diagrams, if not mentioned specifically, are borrowed from RFC 1035.

  • You can find my code at:

wtchangdm/tiny-resolver-rs

A 🦀 DNS resolver

Rust
4
0
  • Keep in mind it’s merely a toy resolver with incomplete support of standards.

  • There are tons of DNS related RFCs. I have a huge respect to whoever implemented these.

Prerequisite knowledge
#

  • An octet is a byte.

  • For a UDP message, the message limit is 512 octets at most.

  • A domain (e.g., "blog.wtcx.dev") should be less or equal to 255 octets. The way how length is calculated, see Domain representation and message compression.

  • A label, aka the components of the domain (e.g., "blog", "wtcx", or "dev"), is less or equal to 63 octets.

Big-endian, little-endian
#

2.3.2. Data Transmission Order

…Whenever an octet represents a numeric quantity, the leftmost bit in the diagram is the high order or most significant bit.

…Similarly, whenever a multi-octet field represents a numeric quantity the left most bit of the whole field is the most significant bit. When a multi-octet quantity is transmitted the most significant octet is transmitted first.

In terms of computer networking, data is mostly transferred in Big-endian order. This means that if the data consists of multiple octets, for example, the TTL (unsigned 32 bit) in a resource record, we have to ensure the most significant byte (MSB) is the leftmost one.

Let’s see how this works in action:

fn main() {
    let n: u32 = 1;

    let n_ne = n.to_ne_bytes();
    let n_be = n.to_be_bytes();
    let n_le = n.to_le_bytes();
    print!(
        "n (native byte order): {:?}\nbig-endian: {:?}\nlittle-endian: {:?}\n",
        n_ne, n_be, n_le
    )
}

Output:

n (native byte order): [1, 0, 0, 0]
big-endian: [0, 0, 0, 1]
little-endian: [1, 0, 0, 0]

If we want to construct a u32 from response (in a buffer of [u8]):

fn main() {
    // let's say we have a TTL of 1 second
    let buf: [u8; 4] = [0, 0, 0, 1];

    let n_ne = u32::from_ne_bytes(buf.try_into().unwrap());
    let n_be = u32::from_be_bytes(buf.try_into().unwrap());
    let n_le = u32::from_le_bytes(buf.try_into().unwrap());
    print!(
        "n (native byte order): {:?}\nbig-endian: {:?}\nlittle-endian: {:?}\n",
        n_ne, n_be, n_le
    )
}

Output:

n (native byte order): 16777216
big-endian: 1
little-endian: 16777216

Huge difference, right? So make sure you are building packets with .to_be_bytes() and reading with .from_be_bytes().

Message format
#

Before we can process the DNS message, we need to know the format.

4.1. Format

+---------------------+
|        Header       |
+---------------------+
|       Question      | the question for the name server
+---------------------+
|        Answer       | RRs answering the question
+---------------------+
|      Authority      | RRs pointing toward an authority
+---------------------+
|      Additional     | RRs holding additional information
+---------------------+

For a query, we can only construct Header and Question sections. When we received a response, it will contain both of these fields as well, followed by optional resource records (RR) for Answer, Authority, and Additional.

Header#

The first part of message is Header, which has a fixed length of 12 octets so is relatively easy to parse.

4.1.1. Header section format


                                1  1  1  1  1  1
  0  1  2  3  4  5  6  7  8  9  0  1  2  3  4  5
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                      ID                       |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|QR|   Opcode  |AA|TC|RD|RA|   Z    |   RCODE   |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                    QDCOUNT                    |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                    ANCOUNT                    |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                    NSCOUNT                    |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                    ARCOUNT                    |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
FieldLengthComment
ID16 bitUsed to validate question & response.
QR1 bit0 for query, 1 for response.
OPCODE4 bit0 for standard query, refer to RFC for others.
AA1 bit1 = authoritative answer.
TC1 bit1 = message is truncated, refer to RFC for others.
RD1 bit1 = recursion desired, refer to RFC for others.
Z3 bitReserved for future use.
RA1 bit1 = recursion available, refer to RFC for others.
RCODE4 bitResponse code. 0 = no error, 3 = NXDOMAIN, check the RFC for the rest.
QDCOUNT16 bitHow many entries in Question section (after Header).
ANCOUNT16 bitRecord count in Answer section.
NSCOUNT16 bitRecord count in Authority section.
ARCOUNT16 bitRecord count in Additional section.

This is the struct I used for Header:

pub struct MessageHeader {
    id: u16,
    flags: u16,
    qd_count: u16,
    an_count: u16,
    ns_count: u16,
    ar_count: u16,
}

It’s also easy to build a header for a standard query – we can just change ID and QDCOUNT, leave other fields to 0 and call it a day.

Extract flags in Header section
#

We learned that different flags in Header take different length, but how do we parse it from a response? Say, we want to check the RCODE (response code) field to see whether the response is successful.

The RCODE takes the last 4 bits of the flags, and since we already parsed the whole flags as a u16, we need to AND it with decimal 15:

// some made up flags
let flags: u16 = 3; // 0b0000000000000011
let rcode = flags & 0b0000000000001111;
// or
let rcode = flags & 0x000F;

Let’s see it in binary form:

      0b0000000000000011
AND   0b0000000000001111

After this operation, we will get the last 4 bits, which is the RCODE we wanted. And when RCODE is 3, it’s the NXDOMAIN that you know and probably don’t love.

Question
#

The Question section is the query’s payload. Parsing becomes slightly trickier because the length is not fixed.

4.1.2. Question section format

                                1  1  1  1  1  1
  0  1  2  3  4  5  6  7  8  9  0  1  2  3  4  5
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                                               |
/                     QNAME                     /
/                                               /
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                     QTYPE                     |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                     QCLASS                    |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
FieldLengthComment
QNAMEvariousQNAME contains label sequence, no one knows how long this will be. But QNAME ends with a 0 octet so just keep parsing until you see it. This field is likely to be compressed.
QTYPE16 bitQTYPE is the type you want to query (e.g., 1 for A record), and is the superset of TYPE. See RFC 1035 Section 3.2.3.
QCLASS16 bitQCLASS is the class you want to query (mostly just 1 for IN, aka the internet). See RFC 1035 Section 3.2.5.

The struct:

pub struct MessageQuestion {
    domain: String,
    q_type: RecordType,
    q_class: RecordClass,
}

Where RecordType and RecordClass are just enums mapping u16 to corresponding variants.

I thought for a while whether I should use String for domain. It does make sense since we don’t know how long it will be, and it’s way easier to perform common operations on a String than &[u8]. The cost is additional allocation, though. But I guess performance is not the problem I need to deal with at the moment so String it is.

Resource Record (RR)
#

The last three sections are Answer, Authority, and Additional. All three of them share the same format:

4.1.3. Resource record format

                                1  1  1  1  1  1
  0  1  2  3  4  5  6  7  8  9  0  1  2  3  4  5
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                                               |
/                                               /
/                      NAME                     /
|                                               |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                      TYPE                     |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                     CLASS                     |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                      TTL                      |
|                                               |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                   RDLENGTH                    |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--|
/                     RDATA                     /
/                                               /
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
FieldLengthComment
NAMEvariousA domain name. like QNAME in Question.
TYPE16 bit1Type of the record (e.g., 1 for A record). See RFC 1035 Section 3.2.2.
CLASS16 bit1Class of the record (mostly just 1 for IN, aka the internet). See RFC 1035 Section 3.2.4.
TTLunsigned 32 bitThe time the record can be cached in seconds. 0 = don’t cache.
RDLENGTHunsigned 16 bitHow long the RDATA field is.
RDATAvariousThe content varies from different TYPE of resource records. see RFC 1035 Section 3.3. Standard RRs.

The struct:

pub struct ResourceRecord {
    pub name: String,
    pub r_type: RecordType,
    pub r_class: RecordClass,
    pub ttl: u32,
    pub rd_length: u16,
    pub r_data: RecordData,
}

Domain representation and message compression
#

Every time you see <domain-name> in RFC, such as QNAME in the Question section or the RDATA field for resource records (RR), it can be compressed to save the payload size, without repeating the data that has been transferred before.

You don’t necessarily need to compress the domain name before sending the query, but response will likely have compressed domain.

Uncompressed domain representation
#

Say, we want to query the address of "blog.wtcx.dev", we will encode the domain into a series of “length octet + label” format.

"blog.wtcx.dev" consists of three labels, "blog", "wtcx", and "dev". We will repeatedly use 1 octet to indicate how long the following label is, and then followed by the actual label. Finally, we end the whole domain with a 0 octet.

Remember there is a 63 octets length limit for each label?

3.1. Name space definitions

…The high order two bits of every length octet must be zero, and the remaining six bits of the length field limit the label to 63 octets or less.

That is, for every length octet, we can only use lower 6 bits of the length octet, which is 0b00111111 (63 in decimal) at most.

The total domain length is calculated by combining all <label_length><label> pairs with the ending octet and don’t have the dots ('.') we see in human-readable format. Therefore, the domain representation for "blog.wtcx.dev" is like 4blog4wtcx3dev0, which is 15 octets long. And we can have at most 255 octets for a domain.

Compression format
#

Let’s borrow the diagram from RFC again:

4.1.4. Message compression

   +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
20 |           1           |           F           |
   +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
22 |           3           |           I           |
   +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
24 |           S           |           I           |
   +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
26 |           4           |           A           |
   +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
28 |           R           |           P           |
   +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
30 |           A           |           0           |
   +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

   +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
40 |           3           |           F           |
   +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
42 |           O           |           O           |
   +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
44 | 1  1|                20                       |
   +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

   +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
64 | 1  1|                26                       |
   +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

   +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
92 |           0           |                       |
   +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

From the 20th (1) to the 31st octet (0), that’s precisely the uncompressed domain representation above.

The range starting from 40th (3) to 43rd (O, English letter, not zero) shares the same logic like uncompressed format.

However, unlike the uncompressed format, the higher two bits of the 44th octet are 11.

Remember the limit where a label can only be 63 bit long? That limit leaves the higher two bits for other purpose.

  • When you see a length octet, like 0b00111111, the first two bits are 00, means it’s uncompressed, normal label.

  • When the first two bits are 11, like you saw on the 44th octet above, it’s a compression format.

  • First two bits that’s 10 or 01 are reserved for future use. I didn’t look into that since it’s not required at this moment.

Check domain name compression and where the pointer points to
#

We know compressed domain consists of a 2-octet group. When the higher 2 bits is 11, the following domain is compressed and the position we need to “jump to” is calculated by the lower 14 bits.

So how do we check whether it’s compressed, and where is the position? Luckily, we can reuse AND operation in the flag example.

Normally, I parse domain names octet by octet because that works well for both uncompressed and compressed domain.

For example, we are parsing the length octet (u8) and see 0b11000010.

We check the first two bits by ANDing a u8 0b11000000 (or a 0xC0 in hex), that way everything other than the first 2 bits will be set to 0.

      0b11000010
AND   0b11000000

and we have the result 0b11000000.

The actual code is self-explanatory as well:

let len = buf[curr_pos] as usize;
// 0xC0 = 0b11000000
// Check whether the first two bits are 11.
let is_compressed = len & 0xC0 == 0xC0;

If it’s not compressed, we continue to parse octet by octet until we see 0 octet.

Otherwise, it’s compressed, and we have to combine the remaining 6 bits with the following 8 bits to calculate the pointer.

Let’s imagine the two octets look like this:

0b1100000000010100

Combine the two octets with:

u16::from_be_bytes([buf[curr_pos], buf[curr_pos + 1]])

and then we get 0b1100000000010100.

In order to set the first two bits to 0, we will AND the value with 0b0011111111111111 (or 0x3FFF in hex).

      0b1100000000010100
AND   0b0011111111111111
let offset = (u16::from_be_bytes([buf[curr_pos], buf[curr_pos + 1]]) & 0x3FFF) as usize;

In this case, we get 20. So we can “jump” to the 20th octet and continue to parse the domain. If you are not sure where were we before the “jump”, check the 44th octet of compression format diagram.

You can check my (& ChatGPT’s) attempt on the domain name parsing on GitHub.

Binding 127.0.0.1 won’t work
#

When sending a DNS query message, we first need to bind a UDP socket on the host and then send packets with it.

At first I thought: I don’t want to bind on every interface, maybe just 127.0.0.1 with a random port. So:

// ...
let socket = UdpSocket::bind("127.0.0.1:0").map_err(Error::NetworkError)?;
println!("socket binding OK!");
let bytes_sent = socket
    .send_to(&query.to_query_bytes(), addr)
    .map_err(Error::NetworkError)?;
// ...

And then:

Looking up blog.wtcx.dev
socket binding OK!
thread 'main' panicked at src/main.rs:6:47:
called `Result::unwrap()` on an `Err` value: NetworkError(Os { code: 49, kind: AddrNotAvailable, message: "Can't assign requested address" })

Seems like we can bind to 127.0.0.1 without problems. But we can’t send packets to a remote name server, why?

After googling a bit, the answer is actually in the UdpSocket::bind document all along:

…Note that bind declares the scope of your network connection. You can only receive datagrams from and send datagrams to participants in that view of the network. For instance, binding to a loopback address as in the example above will prevent you from sending datagrams to another device in your local network.

Even though the name servers we are about to ask are not “another device in your local network”, but you get the idea. The remote peer is not the host that the resolver program is running on, therefore binding on 127.0.0.1 just won’t do.

We can verify this by changing the remote address to an imaginary local name server 127.0.0.1:53, and the packets will be sent without problem!

let bytes_sent = socket
    // WARNING: Of course we don't want to do this. This is just to check that loopback addr <-> loopback addr is possible.
    .send_to(&query.to_query_bytes(), "127.0.0.1:53")
    .map_err(Error::NetworkError)?;

Since DNS resolver is all about asking remote name servers (if not cached), let’s just bind to 0.0.0.0:0 and be done with it.

// ...
let socket = UdpSocket::bind("0.0.0.0:0").map_err(Error::NetworkError)?;
println!("socket binding OK!");
let bytes_sent = socket
    .send_to(&query.to_query_bytes(), addr)
    .map_err(Error::NetworkError)?;
// ...

Unexpected IPv6 ambush
#

Just when I thought that’s pretty much it, right? And started my test. And then came the twist.

My resolver crashed during the parsing of resource record. It complained about unknown record type 28. I went back to search RFC and couldn’t find type 28 anywhere. After checking with ChatGPT I then learned that’s in RFC 3596, the IPv6 extension with the new record type AAAA.

So even though I thought I wouldn’t have to deal with IPv6 anytime soon, it immediately struck back. That’s fair since I am not living in 1980s.

Good thing is all I need to do is just update the RecordType enum and an arm in the match expression.

Improvements worth considering
#

Because this is just a toy DNS resolver, it doesn’t contain extensive tests and doesn’t have most of the standards supported. There are still things I want to do in the future (if possible):

Caching and negative caching
#

RFC specifically said resolver need to cache the result whenever possible (but respect to TTL)! So we don’t always bother root nameservers and the authoritative name servers all the time.

IPv6
#

This is obvious right? Supporting IPv6 is probably one of the most important things for a resolver.

More protocols
#

There are many protocols can be used for DNS queries other than UDP nowadays. TCP, DNS over TLS, or DNS over HTTPS. I don’t know the difficulty yet, but let’s put it on the wishlist.

Async operation
#

I don’t want to use the word “obvious” again but you get the idea. Resolver with sync operation probably only works for a toy CLI program.

Internationalized domain names (IDN)
#

Internationalized domain name (e.g., 中文.tw) is something probably not very popular but should be fun to do. (maybe?)

Final thoughts
#

I mentioned that I always wanted to make a DNS resolver on my own, and I finally did. But implementing without checking how everyone else did it can be both a good and a bad thing.

Good thing is I really delved into RFC and tried to figure out where I was. It was eye-opening to see how stuff underneath really works. Seeing resolver working feels like the first time you dealt with some web APIs and saw it worked just like document mentioned, but in a far lower level of the world.

It of course has different caveats to be taken care of, but is interesting in its own way. I would recommend this to anyone who’s interested!

However, I spent way, way more time than I expected. Most of the time I was just repeatedly looking at the same paragraphs and trying to figure out what document means, while I can definitely check some examples (even asking AI). Besides, there are actually tons of RFCs and I only focused on RFC 1035 in order to make a minimum working PoC.

I also struggled in several places for a while, like bitwise operations and finding what’s TYPE 28. These looked so obvious in hindsight but really took me some time then I just turned to ChatGPT.

My development at work mostly related to high-level programming like integrating RESTful APIs and regular CRUD stuff. As an infrastructure engineer, I have tons of debugging tools at hand. There are rarely chances to build a tool working on low-level myself because… I just don’t need to, like everyone else.

It was a fun experience and a good project to practice Rust (unlike the abandoned advent of code).

Some observations during the making of a DNS resolver:

  • There are more length checks than I thought (I guess it’s still not enough)
  • Malformed pointer in compression format can really mess the resolver up as it can be infinitely recursive. It’s resolver’s responsibility not to be naive.
  • I am not a security expert. But when I saw I got a ~500 bytes response from a ~30 bytes query. That really gives me a concept of how dangerous DNS amplification attack can be!
  • DNS caching is very important. (Especially…in Kubernetes, right?)

Further readings
#

This article was also published on APNIC Blog as a guest post.

  1. RFC 1035 didn’t specify whether it’s signed or unsigned 16 bit. I am using u16 though. ↩︎ ↩︎

W.T. Chang
Author
W.T. Chang

Related

Query Stub Domains with CoreDNS and NodeLocal DNSCache
·1071 words·6 mins
Dev Ops Kubernetes Networking AWS DNS
Advent of Code 2023 using Rust
·2233 words·11 mins
Dev Rust AdventOfCode
Troubleshooting the Connection Reset Incident
·1710 words·9 mins
Dev Ops Kubernetes Networking

comments powered by Disqus