Part 1: Build a DNS query¶

How do we make a query asking for the IP address for google.com?

Well, DNS queries have 2 parts: a header and a question. So we’re going to

create some Python classes for the header and the question
Write header_to_bytes and question_to_bytes functions to convert those objects into byte strings
Write a build_query(domain_name, record_type) function that creates a DNS query

1.1: write the `DNSHeader` and `DNSQuestion` classes¶

First, our DNS Header. This has:

a query ID
some flags (which we’ll mostly ignore)
4 counts (num_questions, num_answers, num_authorities, and num_additionals), telling you how many records to expect in each section of a DNS packet

from dataclasses import dataclass
import dataclasses
import struct

@dataclass
class DNSHeader:
    id: int
    flags: int
    num_questions: int = 0
    num_answers: int = 0
    num_authorities: int = 0
    num_additionals: int = 0

Next, a DNS Question just has 3 fields: a name (like example.com), a type (like A), and a class (which is always the same).

@dataclass
class DNSQuestion:
    name: bytes
    type_: int 
    class_: int 

We’re calling the class and type fields class_ and type_ because class is a reserved word and type is a built in function in Python.

1.2: convert these to bytes¶

Next, we need to write some code to convert our Python classes into byte strings. First I’ll show you the code, then I’ll explain what it means.

def header_to_bytes(header):
    fields = dataclasses.astuple(header)
    # there are 6 `H`s because there are 6 fields
    return struct.pack("!HHHHHH", *fields)

def question_to_bytes(question):
    return question.name + struct.pack("!HH", question.type_, question.class_)

meet `struct.pack`: how we create byte strings¶

In the to_bytes function, we converted our Python objects into a byte string using the struct module, which is built into Python.

Let’s see an example of how struct can convert Python variables into byte strings:

struct.pack('!HH', 5, 23)

b'\x00\x05\x00\x17'

H means “2-byte integer”, so !HH is saying “format the arguments as two 2-byte integers. \x00\x05 is 5 and \x00\x17 is 23.

`struct.pack` format strings¶

In the format string "!HH", there’s an H, which we just said means “2 byte integer”. Here are some more examples of things we’ll be using later in our format strings:

H: 2 bytes (as an integer)
I: 4 bytes (as an integer)
4s: 4 bytes (as a byte string)

Here’s what an example DNS header looks like converted to bytes:

header_to_bytes(DNSHeader(id=0x1314, flags=0, num_questions=1, num_additionals=0, num_authorities=0, num_answers=0))

b'\x13\x14\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00'

a note on byte order¶

Why is there a ! at the beginning of the format string "!HH"? That’s because anytime you convert an integer into a byte string, there are two options for how to do it. Let’s see the two ways to convert the integer 0x01020304 (16909060) into a 4-byte string:

int.to_bytes(0x01020304, length=4, byteorder='little')

b'\x04\x03\x02\x01'

int.to_bytes(0x01020304, length=4, byteorder='big')

b'\x01\x02\x03\x04'

These are the reversed versions of each other. b'\x04\x03\x02\x01' is the “little endian” version and b'\x01\x02\x03\x04' is the “big endian” version.

The names “little-endian” and “big endian” actually have a funny origin: they’re named after two satirical religious sects in Gulliver’s Travels. One sect liked to break eggs on the little end, and the other liked the big end. They’re named after this Gulliver’s travels debate because people used to like to argue a lot about which byte order was best but it didn’t make a big difference.

In network packets, integers are always encoded in a big endian way (though little endian is the default in most other situations). So ! is telling Python “use the byte order for computer networking”.

1.3: encode the name¶

Now we’re ready to build our DNS query.

First, we need to encode the domain name. We don’t literally send “google.com”, instead it gets translated into b"\x06google\x03com\x00". To get this encoding, we split the domain name into parts, and then each part is prepended with its length. So it’s 6 google 3 com 0.

Here’s the code:

def encode_dns_name(domain_name):
    encoded = b""
    for part in domain_name.encode("ascii").split(b"."):
        encoded += bytes([len(part)]) + part
    return encoded + b"\x00"

This code:

starts with an empty byte string
splits the domain name into parts (["google", "com"])
for each part, add the number of bytes in part to the encoded string, as well as part. For example "google" -> b"\x06google".
Finally, add a 0 byte to the end

Let’s run it:

encode_dns_name("google.com")

b'\x06google\x03com\x00'

The first byte of the output is 6 (the length of "google"):

encode_dns_name("google.com")[0]

1.4: build the query¶

Finally, let’s write our build_query function! Our function takes a domain name (like google.com) and the number of a DNS record type (like A).

import random
random.seed(1)

TYPE_A = 1
CLASS_IN = 1

def build_query(domain_name, record_type):
    name = encode_dns_name(domain_name)
    id = random.randint(0, 65535)
    RECURSION_DESIRED = 1 << 8
    header = DNSHeader(id=id, num_questions=1, flags=RECURSION_DESIRED)
    question = DNSQuestion(name=name, type_=record_type, class_=CLASS_IN)
    return header_to_bytes(header) + question_to_bytes(question)

This:

Defines some constants (TYPE_A = 1, CLASS_IN = 1). The encodings for query types and classes are defined in section 3.2.2 to 3.2.4 of RFC 1035.
encodes the DNS name with encode_dns_name
picks a random ID for the query
sets the flags to “recursion desired” (which you need to set any time you’re talking to a DNS resolver). The encoding for the flags is defined in section 4.1.1 of RFC 1035. The reason for RECURSION_DESIRED = 1<<8 is that, according to RFC 1035, the Recursion Desired bit is the 9th bit from the right in the flags field, and 1 << 8 gives you a number that has a 1 in the 9th bit position from the right and 0 everywhere else (1 << 8 = 100000000 in binary).
creates the question
concatenates the header and the question together

build_query("example.com", TYPE_A)

b'D\xcb\x01\x00\x00\x01\x00\x00\x00\x00\x00\x00\x07example\x03com\x00\x00\x01\x00\x01'

1.5: test our code¶

Now let’s test if our code works! Here’s how to send our query to 8.8.8.8:53 using UDP and read the response. I’ve commented the socket code pretty heavily.

import socket

query = build_query("www.example.com", 1)

# create a UDP socket
# `socket.AF_INET` means that we're connecting to the internet
#                  (as opposed to a Unix domain socket `AF_UNIX` for example)
# `socket.SOCK_DGRAM` means "UDP"
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)

# send our query to 8.8.8.8, port 53. Port 53 is the DNS port.
sock.sendto(query, ("8.8.8.8", 53))

# read the response. UDP DNS responses are usually less than 512 bytes
# (see https://www.netmeister.org/blog/dns-size.html for MUCH more on that)
# so reading 1024 bytes is enough
response, _ = sock.recvfrom(1024)

This sends a query to Google’s DNS resolver asking where www.example.com is.

But how can we know that this worked if we don’t know how to parse the response yet? Well, we can run tcpdump to see our program making its DNS query. To test this, we start tcpdump, then run our Python code:

$ sudo tcpdump -ni any port 53
08:31:19.676059 IP 192.168.1.173.62752 > 8.8.8.8.53: 45232+ A? www.example.com. (33)
08:31:19.694678 IP 8.8.8.8.53 > 192.168.1.173.62752: 45232 1/0/0 A 93.184.216.34 (49)

It worked! You can see 8.8.8.8’s answer at the end of tcpdump’s output here, at the end of the second line.

Asking Google’s DNS resolver here is cheating, of course – our final goal is to write a DNS resolver that finds out where example.com is ourself, instead of asking 8.8.8.8 to do the work for us. But this is a nice easy way to check that our code for building a DNS query works.

some debugging tips¶

If you’re implementing this in a non-Python language and you’re struggling to encode the query correctly, here’s a hex encoded version of a correct DNS query:

build_query("www.example.com", TYPE_A).hex()

'3c5f0100000100000000000003777777076578616d706c6503636f6d0000010001'

I’d recommend approaching debugging this way:

First, make sure your UDP code is working by decoding that hex string as bytes in your language, sending those exact bytes to 8.8.8.8 port 53, and using Wireshark or tcpdump to make sure that you get a DNS response.
Then once your UDP code is working, hardcode the query ID to 0x8298 (the first 2 bytes of that string) and make sure that your build_query function is generating those exact bytes.
Then start randomizing the query ID and test your code with other domain names

success!¶

In the next part, we’ll see how to parse this DNS response we just got back:

response

b' O\x81\x80\x00\x01\x00\x01\x00\x00\x00\x00\x03www\x07example\x03com\x00\x00\x01\x00\x01\xc0\x0c\x00\x01\x00\x01\x00\x00K\xc3\x00\x04]\xb8\xd8"'