Pessimal behavior with Windows Update (Long)

Brett Glass <squid-users@xxxxxxxxxxxxxx> · Mon, 27 Jun 2005 18:00:03 -0600

Everyone:

Here, at last, is the long message I promised regarding the 
problems I've observed with Squid, Windows Update, and other 
services which fetch large files (in particular, software patches) 
via HTTP (including Intuit's update mechanism for Quickbooks). I've 
dashed it off in a hurry between installation appointments for my 
wireless ISP, so please forgive any typos, grammatical mistakes, or 
similar errors.

Why cache Windows Update?

Windows Update is now the standard method of securing Windows 
machines against security flaws that have been acknowledged and 
patched by Microsoft. As of XP Service Pack 2, it is configured by 
default to download updates without the user's knowledge or 
consent. The updates are often quite large and can consume a 
considerable amount of a user's bandwidth. The impact is especially 
severe in the case of businesses with small to moderate-sized 
connections to the Internet. Unless the office has installed a 
Microsoft-based server with Software Update Service (SUS) or 
Windows Update Service (WUS), each machine independently "phones 
home" to Microsoft and requests a separate download of the same 
data. The result can amount to a denial of service similar to those 
caused by spyware or other malware. I've seen entire offices -- 
including medical and legal offices -- crippled as every desktop 
machine attempts to retrieve its own copies of updates from 
Microsoft at the same time. The users, of course, are most often 
unaware of what's going on and helpless to stop it. As their ISP, I 
often find out about this problem when a business customer calls to 
complain that its service is slow and business activity has 
effectively been halted. When I examine the company's Internet 
connection using standard "packet sniffing" software, it quickly 
becomes apparent that the fault isn't mine. The business is getting 
the bandwidth that it is paying for, but it's all being used by 
Windows Update! There's simply none left over for the conduct of 
ordinary business. The problem can become especially severe if the 
client is using its Internet connection for VoIP. Since most 
businesses don't have the equipment required to prioritize VoIP 
traffic, they can lose phone service as well.

As an ISP, I also aggregate traffic from many individual users -- 
most of whose computers, alas, run Microsoft Windows. The onslaught 
of updates following one of Microsoft's "Black Tuesday" security 
announcements clogs individual users' connections -- again, 
prompting support calls as users call to ask why service seems 
slow. Worse still, it needlessly clogs my routers and backbone 
connections. And, unlike a business, I have no ability to set up 
SUS or WUS, because this would require me to have administrative 
control of my users' machines.

Problems when caching Windows Update

It's possible to throttle Windows Update traffic to allow business 
as usual to continue -- at least with less severe slowdowns. But 
clearly, the best solution to the bandwidth hogging and massive 
inefficiency of Windows Update is to cache the downloads -- 
transparently if necessary -- at as many levels as possible. (Since 
the downloads are now digitally signed, the chance of forgery is 
virtually nil.) Transparent caching is especially useful because it 
avoids reconfiguration of client machines to accommodate the caching proxy.

Unfortunately, the Squid caching software is configured, by 
default, in such a way as to fail to cache Windows updates.

Here's why. To ensure passage through firewalls, Windows Update 
does not use FTP (which would have many advantages; it would, among 
other things, allow downloads to be restarted and to be recognized 
as file transfers and paced accordingly). In an attempt to deal 
with transient connections to the Internet and to get around P2P 
mitigation mechanisms (which slow or halt long downloads), Windows 
Update does not download patch files all in one piece. Instead, it 
makes multiple requests for segments of the file via HTTP "range" 
requests. It then reassembles the file on disk prior to installing the patch.

Unfortunately, in Squid's default configuration, the parameter 
"range_offset_limit" is set to 0 KB, which means that requests for 
subranges of files (that is, from byte N to byte M) are not cached. 
Thus, while one might think that it would help with the problem 
with no special configuration, Squid "out of the box" does not 
cache Windows Update files. It does nothing to reduce the waste or 
solve the bandwidth starvation problems caused by Windows Update.

Attempts to make things better can make them much worse

It is possible to change the Squid configuration to allow Squid to 
cache range requests. However, the semantics of the tuning 
parameters currently provided by Squid to do so are not well 
designed to deal with the problem.

For example, if one sets the "range_offset_limit" parameter to a 
number greater than zero, Squid will -- instead of merely passing 
the range request on verbatim -- fetch all of the file up to and 
including the requested range. However, once the client's request 
has been satisfied and the client disconnects, Squid's "quick 
abort" mechanism is likely to be triggered (unless the end of the 
subrange just happens to be very close to the end of the file). 
Thus, setting "range_offset_limit" to a fairly large number (or -1, 
which causes the entire file to be fetched) does not just fail to 
improve efficiency; it actually makes things much worse. As Windows 
Update tries to retrieve each segment of the file, the cache 
re-fetches the whole file up to and including the most recently 
requested range. It then drops the connection (unless the transfer 
is near the very end) and discards all of the data that one might 
have hoped it would cache. The next transfer proceeds from the 
beginning of the file again.

The result: the total number of bytes transferred, instead of being 
N (where N is the length of the file), becomes the Nth number in a 
Fibonacci series. (We've seen this "pessimization" effect in caches 
where this one parameter has been altered in an attempt to cache updates.)

It is unclear to this author why the "range_offset_limit" parameter 
is defined in the way it is, since I can think of no case in which 
it's desirable to fetch the entire file up to and including a range 
if the results are overwhelmingly likely to be discarded 
immediately thereafter. (Perhaps it's meant to deal with servers 
that don't implement range requests.) It's also unclear why the 
parameters whose names begin "quick_abort" are applied to such 
fetches at all, since the whole point of prefetching the file is, 
well, to prefetch it even after the client has gotten the range of 
bytes it wants.

In any event, to prevent the cache from stopping its retrieval of 
the file and then starting again from scratch, one can set the 
"quick_abort_min" parameter to -1. This causes Squid to respond to 
a range request by trying to retrieve and cache the entire file.

Unfortunately, this causes yet another problem to surface. If the 
file in question is larger than "maximum_object_size" (the maximum 
object size defined for the cache), Squid retrieves the entire file 
in response to each request but then fails to store it! So, again, 
our attempts to make Squid behave more efficiently can make it 
behave even less so. We have "pessimal" behavior: Squid fetches the 
entire file (which is likely to be very large, because it exceeds 
the maximum object size) on each subrange request, and then 
discards the whole thing. Assuming a fixed range size, the amount 
of data retrieved is now of the order of N squared, where N is the 
size of the file.

The most obvious workaround for this problem is to increase Squid's 
"maximum_object_size" parameter. But this is problematic for two 
reasons. First, the size of Microsoft's bugs (and, hence, its 
patches) seems to be increasing dramatically over time. A few 
months ago, this author set an experimental cache to a maximum 
object size of 384 MB, thinking that surely Microsoft would not 
release a patch larger than that. But Microsoft quickly proved him 
wrong. Just this week, a patch totaling more than 723MB -- about 
the capacity of a CD -- caused that server to revert to pessimal 
behavior. While one might hope that Microsoft would limit its 
patches to the size of a CD (or break it into files that are no 
larger than one CD), we cannot count on this. The day may soon come 
when it's expected that computers come with DVD-ROM drives, 
allowing single files that are gigabytes in size.

Secondly, it is probably not desirable to raise the maximum object 
size of the entire cache -- and possibly cause the caching of other 
large objects that one has no desire to cache -- just for the sake 
of caching Windows Update. (After all, another use of range 
requests is for random access -- to allow a client to fetch just 
the portion of  a large database that it actually needs. If the 
database is frequently updated, and Squid tries to prefetch the 
whole file, it could find that copy to be stale almost immediately. 
This would, ironically, cause the same waste of bandwidth we now 
see with Windows Update.) Unfortunately, Squid doesn't have the 
ability to apply ACLs to the "maximum_object_size" parameter, as it 
does to some others (such as "tcp_outgoing_address"), so it's all or nothing.

Because of the failure modes mentioned above -- all of which result 
in network congestion and huge amounts of wasted bandwidth -- 
institutional users of Squid and ISPs have two choices other than 
setting up SUS or WUS (which ISPs and companies without expensive 
Windows servers can't do). Neither of these choices is attractive, 
but for the moment they are the only two available workarounds.

The first is to leave "range_offset_limit" at zero and forego any 
caching of Windows Update. (Unfortunately, this option allows a 
collection of Windows machines to effectively become a "zombie 
army" that can cripple an Internet feed with no notice.) The second 
is to create an extremely large cache, with an extremely high 
"maximum_object_size" parameter, and set "range_offset_limit" to -1 
and "quick_abort_min" to 0. Such a cache will cache many things 
that one would not ordinarily desire to cache. (The author has seen 
at least one case in which 7 hours of streaming media -- it looked 
like it might have been MP3 delivered via SHOUTcast -- wound up in 
the cache, consuming more than 400 megabytes. If anyone knows of 
ACLs that can be used to prevent this from occurring, I'd like to 
know about them.)

Proposed solutions

The following improvements to Squid could greatly improve 
administrators' ability to cache Windows Update (and other similar 
updates) without the deleterious effects mentioned above.

1. Modify Squid so that before it attempts to prefetch an entire 
object in response to a range request, it first checks the size of 
the object against "maximum_object_size". It should also perform 
all the other checks it normally performs to determine whether an 
object is cacheable (including evaluation of the ACLs, if any, used 
with the "no_cache" parameter). If the entire file is not cacheable 
for any reason whatsoever, the range request should be passed 
through verbatim. (This seems like common sense, but it is 
apparently not done now.)

2. Allow the "maximum_object_size" parameter to be selected, for 
each transfer, via an ACL. (Fortuitously, the standard syntax of 
the Squid configuration file both allows this and provides backward 
compatibility in the case where no ACL is specified. See the 
"tcp_outgoing_address" parameter, which acquired the ability to use 
ACLs only recently and is backward compatible with configuration 
files that don't exploit this new capability.) With this 
modification, an administrator could allow the caching of large 
Windows Update files but not large files from other sources.

3. If a range transfer is to be expanded into a transfer of the 
entire file, exempt the transfer from the "quick_abort" mechanism. 
(After all, it's normal for the client to disconnect after it 
receives the data it has requested.)

4. Encourage Microsoft to modify Windows Update so that it can 
"discover" a server on which updates are preloaded or cached. 
Currently, SUS and WUS require modification to a client machine's 
registry; this is practical for organizations with IT staffs but 
not for ISPs. An ISP should be able to run a Web cache, FTP server, 
or Web server to which Windows updates are downloaded once and then 
distributed downstream. Microsoft has a financial incentive to do 
this, because its updates are currently distributed through Akamai 
(which undoubtedly charges it by the bit for downloads). Alas, we 
can't hold our breath waiting for Microsoft to do such a thing. 
Therefore, the modifications to Squid mentioned above are essential 
to providing an efficient solution -- not only to Windows Update 
issues but also to issues with similar updating systems from Intuit 
and other software vendors.

The first three of these items should be implemented as soon as 
possible, so that administrators of Squid caches can safely cache 
Microsoft's updates. Now that the largest of these have grown to 
more than 700 megabytes, the need is urgent.

--Brett Glass