Ω A Thorny DNS Problem: UDP is OK, but TCP is Not

|

It was frustrating.

I recently added OpenDNS to my list of tools, important mostly because I’ve now got boys who venture out into the wilds of the internet and are learning the ins and outs of adware, malware and images of a “questionable nature,” to put it mildly. (You can discuss the merits of this parenting technique amongst yourselves. This is the choice my wife and I have made.) I simply changed the DNS address being distributed with the DHCP leases to point at my MacOS X Server (10.6.x—the last of the great server versions…) where I forwarded non-authoritative queries upstream to OpenDNS. And all was good.

Between when I did that and a few weeks ago, suddenly Dealnews.com stopped loading. About halfway through the page load, no matter the device—iOS or Mac OS—it would hang. Images wouldn’t load. They wouldn’t load no matter if I were using Safari or Google Reader or Reeder or nothin’. And thus began my quest.

First, I used Safari’s Develop>Start Timeline Recording to watch what was and wasn’t getting loaded. Neat. And it showed me that I was not getting any content from s5.dlnws.com (or its brethren, s1-s4, either). Easy enough. That content would be getting blocked by OpenDNS, certainly.

Except that it wasn’t. In fact, if I pointed my Mac at OpenDNS directly (without the MacOS X Server BIND in between), the page loaded just fine, no problems, no hangups. Using the tools on OpenDNS, I checked to see that the dlnws.com domain wasn’t being blocked. It wasn’t. So clearly, whatever was in the way was happening in my Mac OS X Server’s BIND. But… what? It was time for some conf file fu.

The next step was to figure out how to log queries of my DNS server. In Mac OS X, it’s BIND, so editing things associated with named—the “name daemon”—is the route to take. I added this code into /etc/named.conf. (I can’t find where I found this code, but Google reveals several possibilities.)

logging {
        include "/etc/dns/loggingOptions.conf.apple";

/* added by WNE to allow query logging*/  
        channel query_logging {
                file "/var/log/query.log" versions 7 size 10m;
                severity debug 3;
                print-time yes;
                print-severity yes;
                print-category yes;
        };

        category queries {
                query_logging;
        };
/* end of stuff added by WNE */

};

And I restarted named with Server Admin. Nothing.

Googling told me to turn on query logging. Most instructions tell you to do something like rndc querylog which won’t even work unless you sudo rndc querylog which doesn’t work because the connection is refused on port 953. Hiiiiiii-yah! More Google-fu was required to discover that Mac OS X Server’s default installation of BIND listens on port 54. (See /etc/named.conf if you don’t believe me.) So sudo rndc -p 54 querylog solved that problem, and I began to watch the log in /var/log/query.log.

Sure enough, queries for s5.dlnws.com were showing up. But I still had no pageloads! Seriously, what’s going on here? I mean, when I try it on the iMac from my command line, I get the right answer, don’t I? I get:

the-imac:~ eccles$ dig @192.168.1.4 s5.dlnws.com
;; Truncated, retrying in TCP mode.
;; Connection to 192.168.1.4#53(192.168.1.4) for s5.dlnws.com failed: connection refused.

Ah… I don’t get the right answer! But I did get the right answer if I asked OpenDNS—and I got a huge list of responses. Apparently, s5.dlnws.com is attached to a content delivery network (CDN) with lots of IP addresses. Hmm.

Again, applying some Google-D-40 to the problem (Do you like what I did there? I just invented it.) and I found that A.P. Lawrence had a similar problem, never quite solved since it involved his router. Mine involved Mac OS X, but it still pointed me to the crux of the problem: most DNS queries work using UDP (unsigned datagram protocol), but the UDP packet cannot be longer than 512 bytes. Clearly, this response was much longer than 512 bytes and dig was trying to do it with TCP (transmission control protocol—fancier and more reliable, but with more overhead than UDP), but it was being rebuffed at the gates. BIND wanted nothing to do with answering a query on TCP.

I did all the Google-suggested things, such as netstat and lsof only to discover that BIND was supposedly listening on port 53, both protocols. It just wasn’t paying attention.

The solution to the problem lay in one commented-out line in /etc/named.confwhich I uncommented:

        query-source address * port 53;

If you uncomment this line and restart DNS (using Server Admin or whatever your preferred method is), the problem is solved. Though BIND was listening to TCP, it only wanted to accept queries on UDP.

Problem. Solved.

And now I can shop the bajeebers out of Black Friday. Whew!

Recent Comments