Part 2: Parse the response¶
We got a response to our DNS query for
example.com. But what does it say? Let’s find out! Here’s the response we got:
response = b'`V\x81\x80\x00\x01\x00\x01\x00\x00\x00\x00\x03www\x07example\x03com\x00\x00\x01\x00\x01\xc0\x0c\x00\x01\x00\x01\x00\x00R\x9b\x00\x04]\xb8\xd8"'
Our goal is to write a
parse_response function that parses this response into a friendly Python object we can explore.
We’ll need the code we wrote in Part 1: let’s import it
from part_1 import build_query, DNSQuestion, DNSHeader
2.1: define our DNSRecord class¶
The answer to our query is going to be in a DNS Record, so we need to define one more class.
from dataclasses import dataclass @dataclass class DNSRecord: name: bytes type_: int class_: int ttl: int data: bytes
The fields here are:
name: the domain name
type_: A, AAAA, MX, NS, TXT, etc (encoded as an integer)
class: always the same (1). We’ll ignore this.
ttl: how long to cache the query for. We’ll ignore this.
data: the record’s content, like the IP address.
2.2: parse the DNS header¶
First, we need to parse the DNS header. Here’s the code to do that:
import struct def parse_header(reader): items = struct.unpack("!HHHHHH", reader.read(12)) # see "a note on BytesIO" for an explanation of `reader` here return DNSHeader(*items)
This mirrors our code from
header_to_bytes in Part 1.2: the format string (
!HHHHHH) is exactly the same. Each of the 6 fields is a 2-byte integer, so there are 12 bytes in all to read.
Let’s try it out!
from io import BytesIO reader = BytesIO(response) parse_header(reader)
DNSHeader(id=24662, flags=33152, num_questions=1, num_answers=1, num_authorities=0, num_additionals=0)
We’re already getting somewhere! Our response has:
an ID of
some flags (which we’re going to ignore)
a note on BytesIO¶
reader argument to
parse_header is a
BytesIO lets you keep a pointer to the current position in a byte stream and lets you
read from it and advance the pointer.
This is super convenient and it’s going to let us write code like
reader = BytesIO(request) header = parse_header(reader) question = parse_question(reader)
2.3: parse the domain name (wrong)¶
Next, we have to parse the question. Here’s the question section of the query, and you can see it that it starts with a domain name (
question = reader.read(21) question
So really our next task is to parse a domain name. First, here’s a simple version that doesn’t quite work:
def decode_name_simple(reader): parts =  while (length := reader.read(1)) != 0: parts.append(reader.read(length)) return b".".join(parts)
reads a 1-byte length
reads that many bytes
repeats until the length is 0
concatenates all the parts together with a
.between each one (
Let’s use this function to parse the question section.
2.4: parse the question¶
def parse_question(reader): name = decode_name_simple(reader) data = reader.read(4) type_, class_ = struct.unpack("!HH", data) return DNSQuestion(name, type_, class_)
from io import BytesIO reader = BytesIO(response) parse_header(reader) parse_question(reader)
DNSQuestion(name=b'www.example.com', type_=1, class_=1)
Here the type is
1 (which stands for “A”, IP Address), and the class is 1.
2.5: parse the record¶
Now we’re ready to try to parse the record. Here’s where our
decode_name_simple function is going to break down, but we’ll try it anyway:
def parse_record(reader): name = decode_name_simple(reader) # the the type, class, TTL, and data length together are 10 bytes (2 + 2 + 4 + 2 = 10) # so we read 10 bytes data = reader.read(10) # HHIH means 2-byte int, 2-byte-int, 4-byte int, 2-byte int type_, class_, ttl, data_len = struct.unpack("!HHIH", data) data = reader.read(data_len) return DNSRecord(name, type_, class_, ttl, data)
The record format is defined in section 4.1.2 of RFC 1035.
We can run our
parse_record code like this, and see it fail:
reader = BytesIO(response) parse_header(reader) parse_question(reader) parse_record(reader)
--------------------------------------------------------------------------- IndexError Traceback (most recent call last) Cell In , line 4 2 parse_header(reader) 3 parse_question(reader) ----> 4 parse_record(reader) Cell In , line 2, in parse_record(reader) 1 def parse_record(reader): ----> 2 name = decode_name_simple(reader) 3 # the the type, class, TTL, and data length together are 10 bytes (2 + 2 + 4 + 2 = 10) 4 # so we read 10 bytes 5 data = reader.read(10) Cell In , line 3, in decode_name_simple(reader) 1 def decode_name_simple(reader): 2 parts =  ----> 3 while (length := reader.read(1)) != 0: 4 parts.append(reader.read(length)) 5 return b".".join(parts) IndexError: index out of range
thwarted by DNS compression¶
Oops! It failed. What’s happening here is – if you modify
decode_name_simple to print out the
length, you’ll see at some point that it prints out a length of 192.
But there’s no domain name segment here with a length of 192: the maximum length of each part is 63! The first 2 bits of the byte 192 (
11000000 in binary) are
11, and any length that starts with the bits
11 is code for “this is compressed”.
This is happening because our DNS response contains many copies of the same
domain name, and so DNS uses a simple form of compression to save space. This didn’t show up when parsing the question because the question earlier only had 1 copy of the domain name
example.com in it.
So let’s look at the real version of this function, which handles compressed responses. You can find DNS compression in the specification here: RFC 1035, section 4.1.4.
2.6: implement DNS compression¶
Here’s what the real
decode_name function looks like. It’s the most complicated thing in DNS parsing.
def decode_name(reader): parts =  while (length := reader.read(1)) != 0: if length & 0b1100_0000: parts.append(decode_compressed_name(length, reader)) break else: parts.append(reader.read(length)) return b".".join(parts) def decode_compressed_name(length, reader): pointer_bytes = bytes([length & 0b0011_1111]) + reader.read(1) pointer = struct.unpack("!H", pointer_bytes) current_pos = reader.tell() reader.seek(pointer) result = decode_name(reader) reader.seek(current_pos) return result
What’s going on here is:
Every time we get a length, we check if the first 2 bits are 1s. (like we said before, the maximum length of a component of a DNS name is 63 characters, so in a normal DNS name part the top 2 bits will never be set)
If so, call
takes the bottom 6 bits of the
lengthbyte, plus the next byte, and converts that to an integer called
saves our current position in
goes to the
pointerposition in the DNS packet and decodes a name
restores the current position in
returns the name
A compressed name is never followed by another label, so after decompressing the label we immediately return.
This code as implemented actually has a security vulnerability – see Exercise 3 for more about that.
2.7: finish our DNSRecord parsing¶
Here’s the final
parse_record function. We’ve just replaced
decode_name_simple in the version from part 2.5 with the new
def parse_record(reader): name = decode_name(reader) data = reader.read(10) type_, class_, ttl, data_len = struct.unpack("!HHIH", data) data = reader.read(data_len) return DNSRecord(name, type_, class_, ttl, data)
Let’s test that it works:
reader = BytesIO(response) parse_header(reader) parse_question(reader) parse_record(reader)
DNSRecord(name=b'www.example.com', type_=1, class_=1, ttl=21147, data=b']\xb8\xd8"')
2.8: parse our DNS packet¶
Now that we know how to parse each of the pieces, we can put it all together and parse our entire DNS packet.
Previously we were parsing 1 header, 1 question, and 1 record, but that’s actually not how DNS packets work in general: the header has a bunch of numbers (
num_authorities) that tell us how many records to expect in each section of the packet.
So we should respect that.
Let’s make a class to hold all of the contents of our DNS packet (the header, the questions, and all the records):
from typing import List @dataclass class DNSPacket: header: DNSHeader questions: List[DNSQuestion] # don't worry about the exact meaning of these 3 record # sections for now: we'll use them in Part 3 answers: List[DNSRecord] authorities: List[DNSRecord] additionals: List[DNSRecord]
And here’s the final parsing code:
def parse_dns_packet(data): reader = BytesIO(data) header = parse_header(reader) questions = [parse_question(reader) for _ in range(header.num_questions)] answers = [parse_record(reader) for _ in range(header.num_answers)] authorities = [parse_record(reader) for _ in range(header.num_authorities)] additionals = [parse_record(reader) for _ in range(header.num_additionals)] return DNSPacket(header, questions, answers, authorities, additionals)
packet = parse_dns_packet(response) packet
DNSPacket(header=DNSHeader(id=24662, flags=33152, num_questions=1, num_answers=1, num_authorities=0, num_additionals=0), questions=[DNSQuestion(name=b'www.example.com', type_=1, class_=1)], answers=[DNSRecord(name=b'www.example.com', type_=1, class_=1, ttl=21147, data=b']\xb8\xd8"')], authorities=, additionals=)
Now, let’s try to look at the IP address in this response. What’s the IP for
ip = packet.answers.data ip
Hmm. Looks like we still have a little bit of work to do.
a note on printing binary data¶
The IP address in the previous record is being printed as
b']\xb8\xd8"'. What are the
" doing there?
When Python prints out binary strings, by default it tries to decode their contents as ASCII text when possible. Sometimes this is useful, like this:
There, you can read
com, which makes the binary data a little easier to read because those parts of the data actually are text.
But in the case of
b']\xb8\xd8", it’s not very helpful to know that the first character is an
] in ASCII because the
] byte doesn’t actually represent text. Here are a few other ways to approach printing it:
ip_address = b']\xb8\xd8"' print(ip_address) # the default way print(ip_address.hex()) # as hexadecimal print([x for x in ip_address]) # as an array of 4 numbers in base 10
b']\xb8\xd8"' 5db8d822 [93, 184, 216, 34]
In this case the IP address is
220.127.116.11, so the last representation is actually the most readable. Let’s write some code to pretty print the IP address.
2.9: pretty print the IP address¶
When we get an IPv4 address in a DNS response, it’s not formatted as “18.104.22.168” – instead it’s 4 bytes (1, 2, 3, and 4). So to make it a string we need to pretty print it.
This is pretty simple to do:
ip is a byte string of length 4:
ip, ip, ip, ip
(93, 184, 216, 34)
and the IP address this translates to is
22.214.171.124. Here’s a function to translate the IP to a string:
def ip_to_string(ip): return ".".join([str(x) for x in ip])
2.10: test out all our code¶
Let’s write a little function to look up any domain name using
126.96.36.199 and print out the IP address.
import socket TYPE_A = 1 def lookup_domain(domain_name): query = build_query(domain_name, TYPE_A) sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) sock.sendto(query, ("188.8.131.52", 53)) # get the response data, _ = sock.recvfrom(1024) response = parse_dns_packet(data) return ip_to_string(response.answers.data)
This builds the query, sends it to
184.108.40.206, parses the response, and pretty prints the IP address.
Let’s try it out on a few domain names!
This parsing code is enough to get us to the next part: writing our DNS resolver!
This code is far from perfect – there are some pretty serious bugs, like this one:
or this one:
But I’ll leave those as a puzzle for you to solve if you want (hint: look at the record type!)