Part 1: Build a DNS query¶
How do we make a query asking for the IP address for google.com
?
Well, DNS queries have 2 parts: a header and a question. So we’re going to
create some Python classes for the header and the question
Write
header_to_bytes
andquestion_to_bytes
functions to convert those objects into byte stringsWrite a
build_query(domain_name, record_type)
function that creates a DNS query
1.1: write the DNSHeader
and DNSQuestion
classes¶
First, our DNS Header. This has:
a query ID
some flags (which we’ll mostly ignore)
4 counts (
num_questions
,num_answers
,num_authorities
, andnum_additionals
), telling you how many records to expect in each section of a DNS packet
from dataclasses import dataclass
import dataclasses
import struct
@dataclass
class DNSHeader:
id: int
flags: int
num_questions: int = 0
num_answers: int = 0
num_authorities: int = 0
num_additionals: int = 0
Next, a DNS Question just has 3 fields: a name (like example.com
), a type (like A
), and a class (which is always the same).
@dataclass
class DNSQuestion:
name: bytes
type_: int
class_: int
We’re calling the class and type fields class_
and type_
because class
is a reserved word and type
is a built in function in Python.
1.2: convert these to bytes¶
Next, we need to write some code to convert our Python classes into byte strings. First I’ll show you the code, then I’ll explain what it means.
def header_to_bytes(header):
fields = dataclasses.astuple(header)
# there are 6 `H`s because there are 6 fields
return struct.pack("!HHHHHH", *fields)
def question_to_bytes(question):
return question.name + struct.pack("!HH", question.type_, question.class_)
meet struct.pack
: how we create byte strings¶
In the to_bytes
function, we converted our Python objects into a byte string
using the struct
module, which is built into Python.
Let’s see an example of how struct
can convert Python variables into byte strings:
struct.pack('!HH', 5, 23)
b'\x00\x05\x00\x17'
H
means “2-byte integer”, so !HH
is saying “format the arguments as two
2-byte integers. \x00\x05
is 5 and \x00\x17
is 23.
struct.pack
format strings¶
In the format string "!HH"
, there’s an H
, which we just said means “2 byte integer”. Here are some more examples of things we’ll be using later in our format strings:
H
: 2 bytes (as an integer)I
: 4 bytes (as an integer)4s
: 4 bytes (as a byte string)
Here’s what an example DNS header looks like converted to bytes:
header_to_bytes(DNSHeader(id=0x1314, flags=0, num_questions=1, num_additionals=0, num_authorities=0, num_answers=0))
b'\x13\x14\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00'
a note on byte order¶
Why is there a !
at the beginning of the format string "!HH"
? That’s
because anytime you convert an integer into a byte string, there are two
options for how to do it. Let’s see the two ways to convert the integer
0x01020304
(16909060) into a 4-byte string:
int.to_bytes(0x01020304, length=4, byteorder='little')
b'\x04\x03\x02\x01'
int.to_bytes(0x01020304, length=4, byteorder='big')
b'\x01\x02\x03\x04'
These are the reversed versions of each other. b'\x04\x03\x02\x01'
is the
“little endian” version and b'\x01\x02\x03\x04'
is the “big endian” version.
The names “little-endian” and “big endian” actually have a funny origin: they’re named after two satirical religious sects in Gulliver’s Travels. One sect liked to break eggs on the little end, and the other liked the big end. They’re named after this Gulliver’s travels debate because people used to like to argue a lot about which byte order was best but it didn’t make a big difference.
In network packets, integers are always encoded in a big endian way (though
little endian is the default in most other situations). So !
is telling
Python “use the byte order for computer networking”.
1.3: encode the name¶
Now we’re ready to build our DNS query.
First, we need to encode the domain name. We don’t literally send “google.com”,
instead it gets translated into b"\x06google\x03com\x00"
. To get this encoding, we split the domain name into parts, and then each part is prepended with its length. So it’s 6
google
3
com
0
.
Here’s the code:
def encode_dns_name(domain_name):
encoded = b""
for part in domain_name.encode("ascii").split(b"."):
encoded += bytes([len(part)]) + part
return encoded + b"\x00"
This code:
starts with an empty byte string
splits the domain name into parts (
["google", "com"]
)for each part, add the number of bytes in
part
to the encoded string, as well aspart
. For example"google" -> b"\x06google"
.Finally, add a 0 byte to the end
Let’s run it:
encode_dns_name("google.com")
b'\x06google\x03com\x00'
The first byte of the output is 6
(the length of "google"
):
encode_dns_name("google.com")[0]
6
1.4: build the query¶
Finally, let’s write our build_query
function! Our function takes a domain name (like
google.com
) and the number of a DNS record type (like A
).
import random
random.seed(1)
TYPE_A = 1
CLASS_IN = 1
def build_query(domain_name, record_type):
name = encode_dns_name(domain_name)
id = random.randint(0, 65535)
RECURSION_DESIRED = 1 << 8
header = DNSHeader(id=id, num_questions=1, flags=RECURSION_DESIRED)
question = DNSQuestion(name=name, type_=record_type, class_=CLASS_IN)
return header_to_bytes(header) + question_to_bytes(question)
This:
Defines some constants (
TYPE_A = 1
,CLASS_IN = 1
). The encodings for query types and classes are defined in section 3.2.2 to 3.2.4 of RFC 1035.encodes the DNS name with
encode_dns_name
picks a random ID for the query
sets the flags to “recursion desired” (which you need to set any time you’re talking to a DNS resolver). The encoding for the flags is defined in section 4.1.1 of RFC 1035. The reason for
RECURSION_DESIRED = 1<<8
is that, according to RFC 1035, the Recursion Desired bit is the 9th bit from the right in the flags field, and1 << 8
gives you a number that has a 1 in the 9th bit position from the right and 0 everywhere else (1 << 8
=100000000
in binary).creates the question
concatenates the header and the question together
build_query("example.com", TYPE_A)
b'D\xcb\x01\x00\x00\x01\x00\x00\x00\x00\x00\x00\x07example\x03com\x00\x00\x01\x00\x01'
1.5: test our code¶
Now let’s test if our code works! Here’s how to send our query to 8.8.8.8:53
using UDP and read the response. I’ve commented the socket code pretty heavily.
import socket
query = build_query("www.example.com", 1)
# create a UDP socket
# `socket.AF_INET` means that we're connecting to the internet
# (as opposed to a Unix domain socket `AF_UNIX` for example)
# `socket.SOCK_DGRAM` means "UDP"
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
# send our query to 8.8.8.8, port 53. Port 53 is the DNS port.
sock.sendto(query, ("8.8.8.8", 53))
# read the response. UDP DNS responses are usually less than 512 bytes
# (see https://www.netmeister.org/blog/dns-size.html for MUCH more on that)
# so reading 1024 bytes is enough
response, _ = sock.recvfrom(1024)
This sends a query to Google’s DNS resolver asking where www.example.com
is.
But how can we know that this worked if we don’t know how to parse the response
yet? Well, we can run tcpdump
to see our program making its DNS query. To test this, we start tcpdump
, then run our Python code:
$ sudo tcpdump -ni any port 53
08:31:19.676059 IP 192.168.1.173.62752 > 8.8.8.8.53: 45232+ A? www.example.com. (33)
08:31:19.694678 IP 8.8.8.8.53 > 192.168.1.173.62752: 45232 1/0/0 A 93.184.216.34 (49)
It worked! You can see 8.8.8.8
’s answer at the end of tcpdump’s output here, at the end of the second line.
Asking Google’s DNS resolver here is cheating, of course – our final goal is
to write a DNS resolver that finds out where example.com
is ourself,
instead of asking 8.8.8.8
to do the work for us. But this is a nice easy way
to check that our code for building a DNS query works.
some debugging tips¶
If you’re implementing this in a non-Python language and you’re struggling to encode the query correctly, here’s a hex encoded version of a correct DNS query:
build_query("www.example.com", TYPE_A).hex()
'3c5f0100000100000000000003777777076578616d706c6503636f6d0000010001'
I’d recommend approaching debugging this way:
First, make sure your UDP code is working by decoding that hex string as bytes in your language, sending those exact bytes to
8.8.8.8
port 53, and using Wireshark or tcpdump to make sure that you get a DNS response.Then once your UDP code is working, hardcode the query ID to
0x8298
(the first 2 bytes of that string) and make sure that yourbuild_query
function is generating those exact bytes.Then start randomizing the query ID and test your code with other domain names
success!¶
In the next part, we’ll see how to parse this DNS response we just got back:
response
b' O\x81\x80\x00\x01\x00\x01\x00\x00\x00\x00\x03www\x07example\x03com\x00\x00\x01\x00\x01\xc0\x0c\x00\x01\x00\x01\x00\x00K\xc3\x00\x04]\xb8\xd8"'