Everyone:
Here, at last, is the long message I promised regarding the
problems I've observed with Squid, Windows Update, and other
services which fetch large files (in particular, software patches)
via HTTP (including Intuit's update mechanism for Quickbooks). I've
dashed it off in a hurry between installation appointments for my
wireless ISP, so please forgive any typos, grammatical mistakes, or
similar errors.
Why cache Windows Update?
Windows Update is now the standard method of securing Windows
machines against security flaws that have been acknowledged and
patched by Microsoft. As of XP Service Pack 2, it is configured by
default to download updates without the user's knowledge or
consent. The updates are often quite large and can consume a
considerable amount of a user's bandwidth. The impact is especially
severe in the case of businesses with small to moderate-sized
connections to the Internet. Unless the office has installed a
Microsoft-based server with Software Update Service (SUS) or
Windows Update Service (WUS), each machine independently "phones
home" to Microsoft and requests a separate download of the same
data. The result can amount to a denial of service similar to those
caused by spyware or other malware. I've seen entire offices --
including medical and legal offices -- crippled as every desktop
machine attempts to retrieve its own copies of updates from
Microsoft at the same time. The users, of course, are most often
unaware of what's going on and helpless to stop it. As their ISP, I
often find out about this problem when a business customer calls to
complain that its service is slow and business activity has
effectively been halted. When I examine the company's Internet
connection using standard "packet sniffing" software, it quickly
becomes apparent that the fault isn't mine. The business is getting
the bandwidth that it is paying for, but it's all being used by
Windows Update! There's simply none left over for the conduct of
ordinary business. The problem can become especially severe if the
client is using its Internet connection for VoIP. Since most
businesses don't have the equipment required to prioritize VoIP
traffic, they can lose phone service as well.
As an ISP, I also aggregate traffic from many individual users --
most of whose computers, alas, run Microsoft Windows. The onslaught
of updates following one of Microsoft's "Black Tuesday" security
announcements clogs individual users' connections -- again,
prompting support calls as users call to ask why service seems
slow. Worse still, it needlessly clogs my routers and backbone
connections. And, unlike a business, I have no ability to set up
SUS or WUS, because this would require me to have administrative
control of my users' machines.
Problems when caching Windows Update
It's possible to throttle Windows Update traffic to allow business
as usual to continue -- at least with less severe slowdowns. But
clearly, the best solution to the bandwidth hogging and massive
inefficiency of Windows Update is to cache the downloads --
transparently if necessary -- at as many levels as possible. (Since
the downloads are now digitally signed, the chance of forgery is
virtually nil.) Transparent caching is especially useful because it
avoids reconfiguration of client machines to accommodate the caching proxy.
Unfortunately, the Squid caching software is configured, by
default, in such a way as to fail to cache Windows updates.
Here's why. To ensure passage through firewalls, Windows Update
does not use FTP (which would have many advantages; it would, among
other things, allow downloads to be restarted and to be recognized
as file transfers and paced accordingly). In an attempt to deal
with transient connections to the Internet and to get around P2P
mitigation mechanisms (which slow or halt long downloads), Windows
Update does not download patch files all in one piece. Instead, it
makes multiple requests for segments of the file via HTTP "range"
requests. It then reassembles the file on disk prior to installing the patch.
Unfortunately, in Squid's default configuration, the parameter
"range_offset_limit" is set to 0 KB, which means that requests for
subranges of files (that is, from byte N to byte M) are not cached.
Thus, while one might think that it would help with the problem
with no special configuration, Squid "out of the box" does not
cache Windows Update files. It does nothing to reduce the waste or
solve the bandwidth starvation problems caused by Windows Update.
Attempts to make things better can make them much worse
It is possible to change the Squid configuration to allow Squid to
cache range requests. However, the semantics of the tuning
parameters currently provided by Squid to do so are not well
designed to deal with the problem.
For example, if one sets the "range_offset_limit" parameter to a
number greater than zero, Squid will -- instead of merely passing
the range request on verbatim -- fetch all of the file up to and
including the requested range. However, once the client's request
has been satisfied and the client disconnects, Squid's "quick
abort" mechanism is likely to be triggered (unless the end of the
subrange just happens to be very close to the end of the file).
Thus, setting "range_offset_limit" to a fairly large number (or -1,
which causes the entire file to be fetched) does not just fail to
improve efficiency; it actually makes things much worse. As Windows
Update tries to retrieve each segment of the file, the cache
re-fetches the whole file up to and including the most recently
requested range. It then drops the connection (unless the transfer
is near the very end) and discards all of the data that one might
have hoped it would cache. The next transfer proceeds from the
beginning of the file again.
The result: the total number of bytes transferred, instead of being
N (where N is the length of the file), becomes the Nth number in a
Fibonacci series. (We've seen this "pessimization" effect in caches
where this one parameter has been altered in an attempt to cache updates.)
It is unclear to this author why the "range_offset_limit" parameter
is defined in the way it is, since I can think of no case in which
it's desirable to fetch the entire file up to and including a range
if the results are overwhelmingly likely to be discarded
immediately thereafter. (Perhaps it's meant to deal with servers
that don't implement range requests.) It's also unclear why the
parameters whose names begin "quick_abort" are applied to such
fetches at all, since the whole point of prefetching the file is,
well, to prefetch it even after the client has gotten the range of
bytes it wants.
In any event, to prevent the cache from stopping its retrieval of
the file and then starting again from scratch, one can set the
"quick_abort_min" parameter to -1. This causes Squid to respond to
a range request by trying to retrieve and cache the entire file.
Unfortunately, this causes yet another problem to surface. If the
file in question is larger than "maximum_object_size" (the maximum
object size defined for the cache), Squid retrieves the entire file
in response to each request but then fails to store it! So, again,
our attempts to make Squid behave more efficiently can make it
behave even less so. We have "pessimal" behavior: Squid fetches the
entire file (which is likely to be very large, because it exceeds
the maximum object size) on each subrange request, and then
discards the whole thing. Assuming a fixed range size, the amount
of data retrieved is now of the order of N squared, where N is the
size of the file.
The most obvious workaround for this problem is to increase Squid's
"maximum_object_size" parameter. But this is problematic for two
reasons. First, the size of Microsoft's bugs (and, hence, its
patches) seems to be increasing dramatically over time. A few
months ago, this author set an experimental cache to a maximum
object size of 384 MB, thinking that surely Microsoft would not
release a patch larger than that. But Microsoft quickly proved him
wrong. Just this week, a patch totaling more than 723MB -- about
the capacity of a CD -- caused that server to revert to pessimal
behavior. While one might hope that Microsoft would limit its
patches to the size of a CD (or break it into files that are no
larger than one CD), we cannot count on this. The day may soon come
when it's expected that computers come with DVD-ROM drives,
allowing single files that are gigabytes in size.
Secondly, it is probably not desirable to raise the maximum object
size of the entire cache -- and possibly cause the caching of other
large objects that one has no desire to cache -- just for the sake
of caching Windows Update. (After all, another use of range
requests is for random access -- to allow a client to fetch just
the portion of a large database that it actually needs. If the
database is frequently updated, and Squid tries to prefetch the
whole file, it could find that copy to be stale almost immediately.
This would, ironically, cause the same waste of bandwidth we now
see with Windows Update.) Unfortunately, Squid doesn't have the
ability to apply ACLs to the "maximum_object_size" parameter, as it
does to some others (such as "tcp_outgoing_address"), so it's all or nothing.
Because of the failure modes mentioned above -- all of which result
in network congestion and huge amounts of wasted bandwidth --
institutional users of Squid and ISPs have two choices other than
setting up SUS or WUS (which ISPs and companies without expensive
Windows servers can't do). Neither of these choices is attractive,
but for the moment they are the only two available workarounds.
The first is to leave "range_offset_limit" at zero and forego any
caching of Windows Update. (Unfortunately, this option allows a
collection of Windows machines to effectively become a "zombie
army" that can cripple an Internet feed with no notice.) The second
is to create an extremely large cache, with an extremely high
"maximum_object_size" parameter, and set "range_offset_limit" to -1
and "quick_abort_min" to 0. Such a cache will cache many things
that one would not ordinarily desire to cache. (The author has seen
at least one case in which 7 hours of streaming media -- it looked
like it might have been MP3 delivered via SHOUTcast -- wound up in
the cache, consuming more than 400 megabytes. If anyone knows of
ACLs that can be used to prevent this from occurring, I'd like to
know about them.)
Proposed solutions
The following improvements to Squid could greatly improve
administrators' ability to cache Windows Update (and other similar
updates) without the deleterious effects mentioned above.
1. Modify Squid so that before it attempts to prefetch an entire
object in response to a range request, it first checks the size of
the object against "maximum_object_size". It should also perform
all the other checks it normally performs to determine whether an
object is cacheable (including evaluation of the ACLs, if any, used
with the "no_cache" parameter). If the entire file is not cacheable
for any reason whatsoever, the range request should be passed
through verbatim. (This seems like common sense, but it is
apparently not done now.)
2. Allow the "maximum_object_size" parameter to be selected, for
each transfer, via an ACL. (Fortuitously, the standard syntax of
the Squid configuration file both allows this and provides backward
compatibility in the case where no ACL is specified. See the
"tcp_outgoing_address" parameter, which acquired the ability to use
ACLs only recently and is backward compatible with configuration
files that don't exploit this new capability.) With this
modification, an administrator could allow the caching of large
Windows Update files but not large files from other sources.
3. If a range transfer is to be expanded into a transfer of the
entire file, exempt the transfer from the "quick_abort" mechanism.
(After all, it's normal for the client to disconnect after it
receives the data it has requested.)
4. Encourage Microsoft to modify Windows Update so that it can
"discover" a server on which updates are preloaded or cached.
Currently, SUS and WUS require modification to a client machine's
registry; this is practical for organizations with IT staffs but
not for ISPs. An ISP should be able to run a Web cache, FTP server,
or Web server to which Windows updates are downloaded once and then
distributed downstream. Microsoft has a financial incentive to do
this, because its updates are currently distributed through Akamai
(which undoubtedly charges it by the bit for downloads). Alas, we
can't hold our breath waiting for Microsoft to do such a thing.
Therefore, the modifications to Squid mentioned above are essential
to providing an efficient solution -- not only to Windows Update
issues but also to issues with similar updating systems from Intuit
and other software vendors.
The first three of these items should be implemented as soon as
possible, so that administrators of Squid caches can safely cache
Microsoft's updates. Now that the largest of these have grown to
more than 700 megabytes, the need is urgent.
--Brett Glass