Three Things that DNS Outages Teach Administrator

By Tony Perez | May 5, 2021

Rarely do you wake up thinking to yourself, “wonder how my DNS is doing today?” but I can guarantee it’s been the root cause of one, or two, sleepless nights, “gah, DNS again… grrrrr“.

There is no better example than today’s outage with Register.com and Network Solutions in which customers were told to expect outages ranging 24 – 48 hours. Imagine for a moment, going to your executive team and explaining – “Sorry, our website, our services, and everything in between, will be down for a day or so.”

In the world of system administration, that’s not a fathomable concept.

What can we do to build our own resiliency against issues like this? We all know, these things happen, we don’t have the luxury of pretending they don’t exist, it’s often only a matter of time. So let’s take a look at three things we can do as administrators to improve that resiliency.

Building Resiliency with our Networks

1 – Functional Isolation of the Services. For years I spoke about this when working with organizations to identify and remediate security incidents, managing infrastructure is no different. It’s still a critical pillar of the security triad – Confidentiality, Integrity and Availability.

With domains, separating the services would mean dividing the responsibility of two key components – Registrars and Authoritative DNS. No, they are not the same thing.

Most administrator don’t realize they don’t have to be married together. While a registrar might make it difficult, or work to persuade you, to leverage their Authoritative DNS that is not a requirement. Separating the responsibility between both functions helps remove dependency and improves resiliency.

2 – Failover Authoritative DNS. Any registrar worth their salt will allow you, as the domain owner, to introduce a third-party Auth-DNS service to help in instances like this. A failover Auth-DNS allows you to gracefully respond in the event of an outage.

This outage demonstrates the impacts of not having something like this in place.

In this specific instance, users are stuck until the entire platform is back online. With a failover Auth-DNS service the user would be able to leverage the failover service to mitigate the outage by rerouting traffic accordingly until the main service is back online.

You do this by adding a backup nameserver to your existing registrar, that will replicate your records on a third-party Auth-DNS that sits idle until it’s needed. You never think you need it, until you need it, but when you need it’s already too late to implement.

3 – Automated Failover Detection and Response. The next piece of the puzzle to consider is availability as a whole. Outages occur across the entire stack, and very few administrators realize the real power of Authoritative DNS. Addressing the DNS outage specifically is only the first step, now use this opportunity to think through availability of your domains endpoints (e.g., servers).

Work with an Auth-DNS service provider that leverages the latest in DNS technology to ensure it empowers you to automatically detect and response to endpoint outages. This technology should identify failures in the stack, and programmatically make adjustments to your network to ensure optimal availability without your intervention.

NOC.org an Authoritative DNS Platform

This is a selfless plug, but everything described above is what we built to solve at NOC.org. We built a platform to complement your existing stack, not replace it.

As long as your registrar allows custom nameservers, you have the option to leverage NOC.org for DNS outage resiliency. You also have the option to use our Automated Failover Detection and Recovery feature to move your assets back when the outage is recovered.

Posted in Security