Skip to content


Domain Names are the simple human-readable names for websites. The Internet understands only IP addresses, but since memorizing incoherent numbers is not practical, domain names are used instead. These domain names are translated into IP addresses by the DNS infrastructure. When somebody tries to open in the browser, the browser tries to convert to an IP Address. This process is called DNS resolution. A simple pseudocode depicting this process looks this

ip, err = getIPAddress(domainName)
if err:
  print(“unknown Host Exception while trying to resolve:%s”.format(domainName))

Now let’s try to understand what happens inside the getIPAddress function. The browser would have a DNS cache of its own where it checks if there is a mapping for the domainName to an IP Address already available, in which case the browser uses that IP address. If no such mapping exists, the browser calls gethostbyname syscall to ask the operating system to find the IP address for the given domainName

def getIPAddress(domainName):
    resp, fail = lookupCache(domainName)
    If not fail:
       return resp
       resp, err = gethostbyname(domainName)
       if err:
         return null, err
          return resp

Now lets understand what operating system kernel does when the gethostbyname function is called. The Linux operating system looks at the file /etc/nsswitch.conf file which usually has a line

hosts:      files dns

This line means the OS has to look up first in file (/etc/hosts) and then use DNS protocol to do the resolution if there is no match in /etc/hosts.

The file /etc/hosts is of format

IPAddress FQDN [FQDN].* localhost.localdomain localhost
::1 localhost.localdomain localhost

If a match exists for a domain in this file then that IP address is returned by the OS. Lets add a line to this file

And then do ping

ping -n
PING ( 56(84) bytes of data.
64 bytes from icmp_seq=1 ttl=64 time=0.047 ms
64 bytes from icmp_seq=2 ttl=64 time=0.036 ms
64 bytes from icmp_seq=3 ttl=64 time=0.037 ms

As mentioned earlier, if no match exists in /etc/hosts, the OS tries to do a DNS resolution using the DNS protocol. The linux system makes a DNS request to the first IP in /etc/resolv.conf. If there is no response, requests are sent to subsequent servers in resolv.conf. These servers in resolv.conf are called DNS resolvers. The DNS resolvers are populated by DHCP or statically configured by an administrator. Dig is a userspace DNS system which creates and sends request to DNS resolvers and prints the response it receives to the console.

#run this command in one shell to capture all DNS requests
sudo tcpdump -s 0 -A -i any port 53
#make a dig request from another shell
13:19:54.432507 IP > 527+ [1au] A? (41)
13:19:54.485131 IP > 527 1/0/1 A (57)


The packet capture shows a request is made to (this is the resolver in /etc/resolv.conf) for and a response is received from with the IP address of

Now let's try to understand how DNS resolver tries to find the IP address of DNS resolver first looks at its cache. Since many devices in the network can query for the domain name, the name resolution result may already exist in the cache. If there is a cache miss, it starts the DNS resolution process. The DNS server breaks “” to “.”, “com.” and “” and starts DNS resolution from “.”. The “.” is called root domain and those IPs are known to the DNS resolver software. DNS resolver queries the root domain Nameservers to find the right nameservers which could respond regarding details for "com.". The address of the authoritative nameserver of “com.” is returned. Now the DNS resolution service contacts the authoritative nameserver for “com.” to fetch the authoritative nameserver for “”. Once an authoritative nameserver of “” is known, the resolver contacts Linkedin’s nameserver to provide the IP address of “”. This whole process can be visualized by running

dig +trace       3600    IN  A

This DNS response has 5 fields where the first field is the request and the last field is the response. The second field is the Time to Live which says how long the DNS response is valid in seconds. In this case this mapping of is valid for 1 hour. This is how the resolvers and application(browser) maintain their cache. Any request for beyond 1 hour will be treated as a cache miss as the mapping has expired its TTL and the whole process has to be redone. The 4th field says the type of DNS response/request. Some of the various DNS query types are A, AAAA, NS, TXT, PTR, MX and CNAME. - A record returns IPV4 address of the domain name - AAAA record returns the IPV6 address of the domain Name - NS record returns the authoritative nameserver for the domain name - CNAME records are aliases to the domain names. Some domains point to other domain names and resolving the latter domain name gives an IP which is used as an IP for the former domain name as well. Example’s IP address is the same as - For the brevity we are not discussing other DNS record types, the RFC of each of these records are available here.

dig A +short

dig AAAA +short

dig NS +short

dig CNAME +short

Armed with these fundamentals of DNS lets see usecases where DNS is used by SREs.

Applications in SRE role

This section covers some of the common solutions SRE can derive from DNS

  1. Every company has to have its internal DNS infrastructure for intranet sites and internal services like databases and other internal applications like wiki. So there has to be a DNS infrastructure maintained for those domain names by the infrastructure team. This DNS infrastructure has to be optimized and scaled so that it doesn’t become a single point of failure. Failure of the internal DNS infrastructure can cause API calls of microservices to fail and other cascading effects.
  2. DNS can also be used for discovering services. For example the hostname could list instances which run service b internally in company. Cloud providers provide options to enable DNS discovery(example)
  3. DNS is used by cloud provides and CDN providers to scale their services. In Azure/AWS, Load Balancers are given a CNAME instead of IPAddress. They update the IPAddress of the Loadbalancers as they scale by changing the IP Address of alias domain names. This is one of the reasons why A records of such alias domains are short lived like 1 minute.
  4. DNS can also be used to make clients get IP addresses closer to their location so that their HTTP calls can be responded faster if the company has a presence geographically distributed.
  5. SRE also has to understand since there is no verification in DNS infrastructure, these responses can be spoofed. This is safeguarded by other protocols like HTTPS(dealt later). DNSSEC protects from forged or manipulated DNS responses.
  6. Stale DNS cache can be a problem. Some apps might still be using expired DNS records for their api calls. This is something SRE has to be wary of when doing maintenance.
  7. DNS Loadbalancing and service discovery also has to understand TTL and the servers can be removed from the pool only after waiting till TTL post the changes are made to DNS records. If this is not done, a certain portion of the traffic will fail as the server is removed before the TTL.