Part 1: Build a DNS query¶
How do we make a query asking for the IP address for
Well, DNS queries have 2 parts: a header and a question. So we’re going to
create some Python classes for the header and the question
question_to_bytesfunctions to convert those objects into byte strings
build_query(domain_name, record_type)function that creates a DNS query
1.1: write the
First, our DNS Header. This has:
a query ID
some flags (which we’ll mostly ignore)
4 counts (
num_additionals), telling you how many records to expect in each section of a DNS packet
from dataclasses import dataclass import dataclasses import struct @dataclass class DNSHeader: id: int flags: int num_questions: int = 0 num_answers: int = 0 num_authorities: int = 0 num_additionals: int = 0
Next, a DNS Question just has 3 fields: a name (like
example.com), a type (like
A), and a class (which is always the same).
@dataclass class DNSQuestion: name: bytes type_: int class_: int
We’re calling the class and type fields
class is a reserved word and
type is a built in function in Python.
1.2: convert these to bytes¶
Next, we need to write some code to convert our Python classes into byte strings. First I’ll show you the code, then I’ll explain what it means.
def header_to_bytes(header): fields = dataclasses.astuple(header) # there are 6 `H`s because there are 6 fields return struct.pack("!HHHHHH", *fields) def question_to_bytes(question): return question.name + struct.pack("!HH", question.type_, question.class_)
struct.pack: how we create byte strings¶
to_bytes function, we converted our Python objects into a byte string
struct module, which is built into Python.
Let’s see an example of how
struct can convert Python variables into byte strings:
struct.pack('!HH', 5, 23)
H means “2-byte integer”, so
!HH is saying “format the arguments as two
\x00\x05 is 5 and
\x00\x17 is 23.
struct.pack format strings¶
In the format string
"!HH", there’s an
H, which we just said means “2 byte integer”. Here are some more examples of things we’ll be using later in our format strings:
H: 2 bytes (as an integer)
I: 4 bytes (as an integer)
4s: 4 bytes (as a byte string)
Here’s what an example DNS header looks like converted to bytes:
header_to_bytes(DNSHeader(id=0x1314, flags=0, num_questions=1, num_additionals=0, num_authorities=0, num_answers=0))
a note on byte order¶
Why is there a
! at the beginning of the format string
because anytime you convert an integer into a byte string, there are two
options for how to do it. Let’s see the two ways to convert the integer
0x01020304 (16909060) into a 4-byte string:
int.to_bytes(0x01020304, length=4, byteorder='little')
int.to_bytes(0x01020304, length=4, byteorder='big')
These are the reversed versions of each other.
b'\x04\x03\x02\x01' is the
“little endian” version and
b'\x01\x02\x03\x04' is the “big endian” version.
The names “little-endian” and “big endian” actually have a funny origin: they’re named after two satirical religious sects in Gulliver’s Travels. One sect liked to break eggs on the little end, and the other liked the big end. They’re named after this Gulliver’s travels debate because people used to like to argue a lot about which byte order was best but it didn’t make a big difference.
In network packets, integers are always encoded in a big endian way (though
little endian is the default in most other situations). So
! is telling
Python “use the byte order for computer networking”.
1.3: encode the name¶
Now we’re ready to build our DNS query.
First, we need to encode the domain name. We don’t literally send “google.com”,
instead it gets translated into
b"\x06google\x03com\x00". To get this encoding, we split the domain name into parts, and then each part is prepended with its length. So it’s
Here’s the code:
def encode_dns_name(domain_name): encoded = b"" for part in domain_name.encode("ascii").split(b"."): encoded += bytes([len(part)]) + part return encoded + b"\x00"
starts with an empty byte string
splits the domain name into parts (
for each part, add the number of bytes in
partto the encoded string, as well as
part. For example
"google" -> b"\x06google".
Finally, add a 0 byte to the end
Let’s run it:
The first byte of the output is
6 (the length of
1.4: build the query¶
Finally, let’s write our
build_query function! Our function takes a domain name (like
google.com) and the number of a DNS record type (like
import random random.seed(1) TYPE_A = 1 CLASS_IN = 1 def build_query(domain_name, record_type): name = encode_dns_name(domain_name) id = random.randint(0, 65535) RECURSION_DESIRED = 1 << 8 header = DNSHeader(id=id, num_questions=1, flags=RECURSION_DESIRED) question = DNSQuestion(name=name, type_=record_type, class_=CLASS_IN) return header_to_bytes(header) + question_to_bytes(question)
Defines some constants (
TYPE_A = 1,
CLASS_IN = 1). The encodings for query types and classes are defined in section 3.2.2 to 3.2.4 of RFC 1035.
encodes the DNS name with
picks a random ID for the query
sets the flags to “recursion desired” (which you need to set any time you’re talking to a DNS resolver). The encoding for the flags is defined in section 4.1.1 of RFC 1035. The reason for
RECURSION_DESIRED = 1<<8is that, according to RFC 1035, the Recursion Desired bit is the 9th bit from the right in the flags field, and
1 << 8gives you a number that has a 1 in the 9th bit position from the right and 0 everywhere else (
1 << 8=
creates the question
concatenates the header and the question together
1.5: test our code¶
Now let’s test if our code works! Here’s how to send our query to
22.214.171.124:53 using UDP and read the response. I’ve commented the socket code pretty heavily.
import socket query = build_query("www.example.com", 1) # create a UDP socket # `socket.AF_INET` means that we're connecting to the internet # (as opposed to a Unix domain socket `AF_UNIX` for example) # `socket.SOCK_DGRAM` means "UDP" sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) # send our query to 126.96.36.199, port 53. Port 53 is the DNS port. sock.sendto(query, ("188.8.131.52", 53)) # read the response. UDP DNS responses are usually less than 512 bytes # (see https://www.netmeister.org/blog/dns-size.html for MUCH more on that) # so reading 1024 bytes is enough response, _ = sock.recvfrom(1024)
This sends a query to Google’s DNS resolver asking where
But how can we know that this worked if we don’t know how to parse the response
yet? Well, we can run
tcpdump to see our program making its DNS query. To test this, we start
tcpdump, then run our Python code:
$ sudo tcpdump -ni any port 53 08:31:19.676059 IP 192.168.1.173.62752 > 184.108.40.206.53: 45232+ A? www.example.com. (33) 08:31:19.694678 IP 220.127.116.11.53 > 192.168.1.173.62752: 45232 1/0/0 A 18.104.22.168 (49)
It worked! You can see
22.214.171.124’s answer at the end of tcpdump’s output here, at the end of the second line.
Asking Google’s DNS resolver here is cheating, of course – our final goal is
to write a DNS resolver that finds out where
example.com is ourself,
instead of asking
126.96.36.199 to do the work for us. But this is a nice easy way
to check that our code for building a DNS query works.
some debugging tips¶
If you’re implementing this in a non-Python language and you’re struggling to encode the query correctly, here’s a hex encoded version of a correct DNS query:
I’d recommend approaching debugging this way:
First, make sure your UDP code is working by decoding that hex string as bytes in your language, sending those exact bytes to
188.8.131.52port 53, and using Wireshark or tcpdump to make sure that you get a DNS response.
Then once your UDP code is working, hardcode the query ID to
0x8298(the first 2 bytes of that string) and make sure that your
build_queryfunction is generating those exact bytes.
Then start randomizing the query ID and test your code with other domain names
In the next part, we’ll see how to parse this DNS response we just got back: