Using Squid With NTL

My network is on the wrong side of NTL's "transparent" web proxy. I'm never sure whether to refer to this as a poxy cache, or proxy crash. Either seems quite appropriate as it is unreliable and slow. Anyway, rather than have my own local Squid talk to it thinking it's talking to the origin web servers, I decided to make it connect directly to the cache, so it at least knew what it was dealing with.

Here's the cache_peer lines I added:

cache_peer cmbg-cache-1.server.ntli.net parent 8080 7 no-query round-robin weight=10 no-digest
cache_peer cmbg-cache-2.server.ntli.net parent 8080 7 no-query round-robin weight=10 no-digest
cache_peer cmbg-cache-3.server.ntli.net parent 8080 7 no-query round-robin weight=10 no-digest
cache_peer cmbg-cache-4.server.ntli.net parent 8080 7 no-query round-robin weight=10 no-digest
cache_peer cmbg-cache-5.server.ntli.net parent 8080 7 no-query round-robin weight=10 no-digest

(Note that the hostnames and port numbers may well be different for other situations.)

That got basic browsing routed from my Squid to NTL's. I also have some internal web servers, though, and of course NTL can't see those.

I initially tried using always_direct with a dstdom_regex acl, but this completely failed to work. Perhaps this is a bug. It turns out that the solution is to specify not the name of the websites that are to be accessed directly but the IP address. This presumably creates a requirement for an extra DNS lookup for every external website, which seems a bit wasteful. Anyway the syntax I used was:

acl rfc1918 dst 192.168.0.0/16 172.16.0.0/12 10.0.0.0/8 
always_direct allow rfc1918

Now I could see all internal web servers again. I list the whole of RFC1918 space rather than just my own LAN because I have access to some other, similar LANs via secnet. Currently they all use RFC1918 addresses and this doesn't seem about to change.

But that's not all. I noticed that some requests were still being sent directly to their origin servers (actually to the "transparent" proxy, of course). After a bit of experimentation I determined that it was HEAD requests that were being sent direct and GET requests that were being forwarded to the configured parent. I suppose there's a certain amount of sense to this, but I've not managed to find it documented. Anyway the syntax to stop this happening and send all requests to the parent is:

acl all src 0.0.0.0/0.0.0.0
never_direct allow all

(Actually "acl all" is defined much further up my configuration file.)

Results

Was it worth the effort? It seems to be. Over the week or two prior to reading this, web browsing had become slow and unreliable, albeit more noticably so for some sites than others. Immediately after making the changes above it seemed much better. Nearly a month later, things still seem to be fine in this configuration.

A month after that web browsing started to become flaky again. You'll notice that above I list two parent caches. In fact Squid seems to only use one of them, and always chooses the same one. In this case swapping the two lines and restarting my Squid appears to improve matters.

I've since discovered the round-robin option (I should have read the documentation properly the first time l-) which makes my Squid query both the parent caches. The documentation implies that if one of the parents stops working, Squid should nevertheless cope, and we'll see if this works in practice.

The no-digest option was also added more recently than much of the rest of this page. It makes some log messages go away; I'm not sure if it has a practical benefit. Whatever the nature feature it suppresses, the Inktomis don't support it.

Alternative Suggestions

I'm convinced, particularly based on my experience with NTL, that intercepting proxies of this kind are a bad idea; they may succeed in getting enough users to use a cache but they seem to degrade the service they enough that it becomes attractive to seek workarounds (such as the one described here). Money spent on a service that users try to work around is money wasted.

This may be partly a fact about NTL's implementation rather than about the technology in general, but it only partly so; some of the problems are inherent to intercepting proxies.

Here are some alternative possibilities:

Raw enforcement: block outgoing connections to port 80, thus requiring all customers to use a traditional web proxy. This is simple to implement, but has poor failure modes (nobody can use the web if the proxies are down). These could be addressed by temporarily lifting the block while the proxies were down.
As above, but open port 80 for specific customers who want to pay a little extra.
Financial incentive: monitor outgoing connections to port 80 and charge a bit extra for those - but keep connections through proxy caches free. This would maintain the basic fixed-price nature of the service but creates a cost incentive to use the cache.
Technical incentive: just ask people to use the proxies without using any technical or financial measure to force them to do so, instead making sure they are sufficiently well provisioned that it's almost always faster to do so. I often get the existing service maxing out my connection, so I don't believe that this could be much more expensive than the existing system.

Links

http://www.thevalkyrie.com/members/tech/tech-ispcache.txt has some further notes from someone else behind NTL's proxies. I got the hostnames for the proxy servers from there.

Robin Walker's Cable Modem Troubleshooting Tips has lots of other hints regarding the NTL broadband service.

Inktomi are apparently the firm responsible for NTL's web proxies. Given their poor reliability I can't exactly recommend buying their products! (Update: it turns out that they can't do NNTP correctly either; their implementation of STAT always claims that the article exists, even when it doesn't.)

RFC2616 describes HTTP, if you're lost by the discussion about "HEAD" and "GET" requests above.

Improving performance using Linux traffic shaping - workarounds for an unrelated set of cable modem problems.

RJK | Contents