Webhostingtalk.com and Inet Interactive

For me, at least, webhostingtalk had been down all day, about ten hours in all, probably longer. I thought maybe it was a routing problem, but couldn’t get to them from any of several locations. Then I used my geek skills to see what was going on…

The answer is pretty embarassing.

It seems both of their nameservers – those belonging to the site’s owners, Inet Interactive – were up and pingable, but not responding to DNS queries. Now, it could be because of the RedHat BIND problem that’s causing a lot of problems. I really couldn’t say. That would be bad enough, in and of itself. The “official” explanation on WHT is that it was a load-balancer that failed, causing the “entire cluster” to be down. This, I have to say, is most likely untruth. What I can say is that they have exactly two nameservers, one IP apart on the same netblock, both of which were up and running during this time, just not responding. There’s a very good chance they’re really just one machine with two IPs.

And that, if you know anything about DNS, is just plain embarassing. It’s also just plain cheap. Given the quantity of servers behind Inet’s various online properties – which includes things like HotScripts.com – and the absurdly high advertising revenues they get, having just a single nameserver is ridiculous. I mean, they’ve made the same identical amateur mistake hundreds if not thousands of moronic high-school and college students make. All of Inet’s sites, including their own – sites that earn thousands of dollars a day in revenue – are down, because they didn’t want to spend another thousand bucks a year to do their DNS the right way. At their size – really, for any online business – frugality is one thing; incompetence is quite another.

Redundant DNS, folks. It doesn’t exist because it’s sexy or easy to do – it exists because it works.

Update: I really believe the “load-balancer failure” explanation is disingenious. After giving it some thought, my belief is that their nameserver or nameservers may have failed as much as a week ago. In the seven days before today, there were significantly fewer posts than usual, something I’d attributed to school starting. Is it possible, though, that regular visitors were still living on cached DNS results, lookups were failing for new visitors, and nobody noticed the DNS server wasn’t responding until traffic completely cut off when the TTL and retry times expired? Whichever way you look at it, I still maintain that any company running it’s own servers – especially ones big enough to claim to have a load-balancing cluster – should have a RFC-compliant, geographically-diverse, redundant DNS setup… to say nothing of a fault-tolerant network setup without points of failure… but what do I know?

Published in: 'D' for 'Dumb', Geekiness | on September 2nd, 2006| Comments Off on Webhostingtalk.com and Inet Interactive

Both comments and pings are currently closed.

Comments are closed.