Part 2: Parse the response¶
We got a response to our DNS query for example.com
. But what does it say? Let’s find out! Here’s the response we got:
response = b'`V\x81\x80\x00\x01\x00\x01\x00\x00\x00\x00\x03www\x07example\x03com\x00\x00\x01\x00\x01\xc0\x0c\x00\x01\x00\x01\x00\x00R\x9b\x00\x04]\xb8\xd8"'
Our goal is to write a parse_response
function that parses this response into a friendly Python object we can explore.
We’ll need the code we wrote in Part 1: let’s import it
from part_1 import build_query, DNSQuestion, DNSHeader
2.1: define our DNSRecord class¶
The answer to our query is going to be in a DNS Record, so we need to define one more class.
from dataclasses import dataclass
@dataclass
class DNSRecord:
name: bytes
type_: int
class_: int
ttl: int
data: bytes
The fields here are:
name
: the domain nametype_
: A, AAAA, MX, NS, TXT, etc (encoded as an integer)class
: always the same (1). We’ll ignore this.ttl
: how long to cache the query for. We’ll ignore this.data
: the record’s content, like the IP address.
2.2: parse the DNS header¶
First, we need to parse the DNS header. Here’s the code to do that:
import struct
def parse_header(reader):
items = struct.unpack("!HHHHHH", reader.read(12))
# see "a note on BytesIO" for an explanation of `reader` here
return DNSHeader(*items)
This mirrors our code from header_to_bytes
in Part 1.2: the format string (!HHHHHH
) is exactly the same. Each of the 6 fields is a 2-byte integer, so there are 12 bytes in all to read.
Let’s try it out!
from io import BytesIO
reader = BytesIO(response)
parse_header(reader)
DNSHeader(id=24662, flags=33152, num_questions=1, num_answers=1, num_authorities=0, num_additionals=0)
We’re already getting somewhere! Our response has:
an ID of
24662
some flags (which we’re going to ignore)
1 question
1 answer
a note on BytesIO¶
This reader
argument to parse_header
is a BytesIO
object.
BytesIO
lets you keep a pointer to the current position in a byte stream and lets you
read from it and advance the pointer.
This is super convenient and it’s going to let us write code like
reader = BytesIO(request)
header = parse_header(reader)
question = parse_question(reader)
2.3: parse the domain name (wrong)¶
Next, we have to parse the question. Here’s the question section of the query, and you can see it that it starts with a domain name (www.example.com
)
question = reader.read(21)
question
b'\x03www\x07example\x03com\x00\x00\x01\x00\x01'
So really our next task is to parse a domain name. First, here’s a simple version that doesn’t quite work:
def decode_name_simple(reader):
parts = []
while (length := reader.read(1)[0]) != 0:
parts.append(reader.read(length))
return b".".join(parts)
This:
reads a 1-byte length
reads that many bytes
repeats until the length is 0
concatenates all the parts together with a
.
between each one (['example', 'com']
=>'example.com'
)
Let’s use this function to parse the question section.
2.4: parse the question¶
def parse_question(reader):
name = decode_name_simple(reader)
data = reader.read(4)
type_, class_ = struct.unpack("!HH", data)
return DNSQuestion(name, type_, class_)
from io import BytesIO
reader = BytesIO(response)
parse_header(reader)
parse_question(reader)
DNSQuestion(name=b'www.example.com', type_=1, class_=1)
Here the type is 1
(which stands for “A”, IP Address), and the class is 1.
2.5: parse the record¶
Now we’re ready to try to parse the record. Here’s where our decode_name_simple
function is going to break down, but we’ll try it anyway:
def parse_record(reader):
name = decode_name_simple(reader)
# the the type, class, TTL, and data length together are 10 bytes (2 + 2 + 4 + 2 = 10)
# so we read 10 bytes
data = reader.read(10)
# HHIH means 2-byte int, 2-byte-int, 4-byte int, 2-byte int
type_, class_, ttl, data_len = struct.unpack("!HHIH", data)
data = reader.read(data_len)
return DNSRecord(name, type_, class_, ttl, data)
The record format is defined in section 4.1.2 of RFC 1035.
We can run our parse_record
code like this, and see it fail:
reader = BytesIO(response)
parse_header(reader)
parse_question(reader)
parse_record(reader)
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
Cell In [18], line 4
2 parse_header(reader)
3 parse_question(reader)
----> 4 parse_record(reader)
Cell In [17], line 2, in parse_record(reader)
1 def parse_record(reader):
----> 2 name = decode_name_simple(reader)
3 # the the type, class, TTL, and data length together are 10 bytes (2 + 2 + 4 + 2 = 10)
4 # so we read 10 bytes
5 data = reader.read(10)
Cell In [14], line 3, in decode_name_simple(reader)
1 def decode_name_simple(reader):
2 parts = []
----> 3 while (length := reader.read(1)[0]) != 0:
4 parts.append(reader.read(length))
5 return b".".join(parts)
IndexError: index out of range
thwarted by DNS compression¶
Oops! It failed. What’s happening here is – if you modify decode_name_simple
to print out the length
, you’ll see at some point that it prints out a length of 192.
But there’s no domain name segment here with a length of 192: the maximum length of each part is 63! The first 2 bits of the byte 192 (11000000
in binary) are 11
, and any length that starts with the bits 11
is code for “this is compressed”.
This is happening because our DNS response contains many copies of the same
domain name, and so DNS uses a simple form of compression to save space. This didn’t show up when parsing the question because the question earlier only had 1 copy of the domain name example.com
in it.
So let’s look at the real version of this function, which handles compressed responses. You can find DNS compression in the specification here: RFC 1035, section 4.1.4.
2.6: implement DNS compression¶
Here’s what the real decode_name
function looks like. It’s the most complicated thing in DNS parsing.
def decode_name(reader):
parts = []
while (length := reader.read(1)[0]) != 0:
if length & 0b1100_0000:
parts.append(decode_compressed_name(length, reader))
break
else:
parts.append(reader.read(length))
return b".".join(parts)
def decode_compressed_name(length, reader):
pointer_bytes = bytes([length & 0b0011_1111]) + reader.read(1)
pointer = struct.unpack("!H", pointer_bytes)[0]
current_pos = reader.tell()
reader.seek(pointer)
result = decode_name(reader)
reader.seek(current_pos)
return result
What’s going on here is:
Every time we get a length, we check if the first 2 bits are 1s. (like we said before, the maximum length of a component of a DNS name is 63 characters, so in a normal DNS name part the top 2 bits will never be set)
If so, call
decode_compressed_name
, which:
takes the bottom 6 bits of the
length
byte, plus the next byte, and converts that to an integer calledpointer
saves our current position in
reader
goes to the
pointer
position in the DNS packet and decodes a namerestores the current position in
reader
returns the name
A compressed name is never followed by another label, so after decompressing the label we immediately return.
This code as implemented actually has a security vulnerability – see Exercise 3 for more about that.
2.7: finish our DNSRecord parsing¶
Here’s the final parse_record
function. We’ve just replaced decode_name_simple
in the version from part 2.5 with the new decode_name
.
def parse_record(reader):
name = decode_name(reader)
data = reader.read(10)
type_, class_, ttl, data_len = struct.unpack("!HHIH", data)
data = reader.read(data_len)
return DNSRecord(name, type_, class_, ttl, data)
Let’s test that it works:
reader = BytesIO(response)
parse_header(reader)
parse_question(reader)
parse_record(reader)
DNSRecord(name=b'www.example.com', type_=1, class_=1, ttl=21147, data=b']\xb8\xd8"')
Hooray!
2.8: parse our DNS packet¶
Now that we know how to parse each of the pieces, we can put it all together and parse our entire DNS packet.
Previously we were parsing 1 header, 1 question, and 1 record, but that’s actually not how DNS packets work in general: the header has a bunch of numbers (num_questions
, num_answers
, num_additionals
, and num_authorities
) that tell us how many records to expect in each section of the packet.
So we should respect that.
Let’s make a class to hold all of the contents of our DNS packet (the header, the questions, and all the records):
from typing import List
@dataclass
class DNSPacket:
header: DNSHeader
questions: List[DNSQuestion]
# don't worry about the exact meaning of these 3 record
# sections for now: we'll use them in Part 3
answers: List[DNSRecord]
authorities: List[DNSRecord]
additionals: List[DNSRecord]
And here’s the final parsing code:
def parse_dns_packet(data):
reader = BytesIO(data)
header = parse_header(reader)
questions = [parse_question(reader) for _ in range(header.num_questions)]
answers = [parse_record(reader) for _ in range(header.num_answers)]
authorities = [parse_record(reader) for _ in range(header.num_authorities)]
additionals = [parse_record(reader) for _ in range(header.num_additionals)]
return DNSPacket(header, questions, answers, authorities, additionals)
packet = parse_dns_packet(response)
packet
DNSPacket(header=DNSHeader(id=24662, flags=33152, num_questions=1, num_answers=1, num_authorities=0, num_additionals=0), questions=[DNSQuestion(name=b'www.example.com', type_=1, class_=1)], answers=[DNSRecord(name=b'www.example.com', type_=1, class_=1, ttl=21147, data=b']\xb8\xd8"')], authorities=[], additionals=[])
Now, let’s try to look at the IP address in this response. What’s the IP for www.example.com
?
ip = packet.answers[0].data
ip
b']\xb8\xd8"'
Hmm. Looks like we still have a little bit of work to do.
a note on printing binary data¶
The IP address in the previous record is being printed as b']\xb8\xd8"'
. What are the [
and "
doing there?
When Python prints out binary strings, by default it tries to decode their contents as ASCII text when possible. Sometimes this is useful, like this:
response[12:30]
b'\x03www\x07example\x03com\x00\x00'
There, you can read www
, example
, and com
, which makes the binary data a little easier to read because those parts of the data actually are text.
But in the case of b']\xb8\xd8"
, it’s not very helpful to know that the first character is an ]
in ASCII because the ]
byte doesn’t actually represent text. Here are a few other ways to approach printing it:
ip_address = b']\xb8\xd8"'
print(ip_address) # the default way
print(ip_address.hex()) # as hexadecimal
print([x for x in ip_address]) # as an array of 4 numbers in base 10
b']\xb8\xd8"'
5db8d822
[93, 184, 216, 34]
In this case the IP address is 93.184.216.34
, so the last representation is actually the most readable. Let’s write some code to pretty print the IP address.
2.9: pretty print the IP address¶
When we get an IPv4 address in a DNS response, it’s not formatted as “1.2.3.4” – instead it’s 4 bytes (1, 2, 3, and 4). So to make it a string we need to pretty print it.
This is pretty simple to do: ip
is a byte string of length 4:
ip[0], ip[1], ip[2], ip[3]
(93, 184, 216, 34)
and the IP address this translates to is 93.184.216.34
. Here’s a function to translate the IP to a string:
def ip_to_string(ip):
return ".".join([str(x) for x in ip])
ip_to_string(packet.answers[0].data)
'93.184.216.34'
2.10: test out all our code¶
Let’s write a little function to look up any domain name using 8.8.8.8
and print out the IP address.
import socket
TYPE_A = 1
def lookup_domain(domain_name):
query = build_query(domain_name, TYPE_A)
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sock.sendto(query, ("8.8.8.8", 53))
# get the response
data, _ = sock.recvfrom(1024)
response = parse_dns_packet(data)
return ip_to_string(response.answers[0].data)
This builds the query, sends it to 8.8.8.8
, parses the response, and pretty prints the IP address.
Let’s try it out on a few domain names!
lookup_domain("example.com")
'93.184.216.34'
lookup_domain("recurse.com")
'13.225.195.117'
lookup_domain("metafilter.com")
'54.203.56.158'
This parsing code is enough to get us to the next part: writing our DNS resolver!
This code is far from perfect – there are some pretty serious bugs, like this one:
lookup_domain("www.facebook.com")
'9.115.116.97.114.45.109.105.110.105.4.99.49.48.114.192.16'
or this one:
lookup_domain("www.metafilter.com")
'192.16'
But I’ll leave those as a puzzle for you to solve if you want (hint: look at the record type!)