Search squid archive

Pessimal behavior with Windows Update (Long)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Everyone:

Here, at last, is the long message I promised regarding the problems I've observed with Squid, Windows Update, and other services which fetch large files (in particular, software patches) via HTTP (including Intuit's update mechanism for Quickbooks). I've dashed it off in a hurry between installation appointments for my wireless ISP, so please forgive any typos, grammatical mistakes, or similar errors.

Why cache Windows Update?

Windows Update is now the standard method of securing Windows machines against security flaws that have been acknowledged and patched by Microsoft. As of XP Service Pack 2, it is configured by default to download updates without the user's knowledge or consent. The updates are often quite large and can consume a considerable amount of a user's bandwidth. The impact is especially severe in the case of businesses with small to moderate-sized connections to the Internet. Unless the office has installed a Microsoft-based server with Software Update Service (SUS) or Windows Update Service (WUS), each machine independently "phones home" to Microsoft and requests a separate download of the same data. The result can amount to a denial of service similar to those caused by spyware or other malware. I've seen entire offices -- including medical and legal offices -- crippled as every desktop machine attempts to retrieve its own copies of updates from Microsoft at the same time. The users, of course, are most often unaware of what's going on and helpless to stop it. As their ISP, I often find out about this problem when a business customer calls to complain that its service is slow and business activity has effectively been halted. When I examine the company's Internet connection using standard "packet sniffing" software, it quickly becomes apparent that the fault isn't mine. The business is getting the bandwidth that it is paying for, but it's all being used by Windows Update! There's simply none left over for the conduct of ordinary business. The problem can become especially severe if the client is using its Internet connection for VoIP. Since most businesses don't have the equipment required to prioritize VoIP traffic, they can lose phone service as well.

As an ISP, I also aggregate traffic from many individual users -- most of whose computers, alas, run Microsoft Windows. The onslaught of updates following one of Microsoft's "Black Tuesday" security announcements clogs individual users' connections -- again, prompting support calls as users call to ask why service seems slow. Worse still, it needlessly clogs my routers and backbone connections. And, unlike a business, I have no ability to set up SUS or WUS, because this would require me to have administrative control of my users' machines.

Problems when caching Windows Update

It's possible to throttle Windows Update traffic to allow business as usual to continue -- at least with less severe slowdowns. But clearly, the best solution to the bandwidth hogging and massive inefficiency of Windows Update is to cache the downloads -- transparently if necessary -- at as many levels as possible. (Since the downloads are now digitally signed, the chance of forgery is virtually nil.) Transparent caching is especially useful because it avoids reconfiguration of client machines to accommodate the caching proxy.

Unfortunately, the Squid caching software is configured, by default, in such a way as to fail to cache Windows updates.

Here's why. To ensure passage through firewalls, Windows Update does not use FTP (which would have many advantages; it would, among other things, allow downloads to be restarted and to be recognized as file transfers and paced accordingly). In an attempt to deal with transient connections to the Internet and to get around P2P mitigation mechanisms (which slow or halt long downloads), Windows Update does not download patch files all in one piece. Instead, it makes multiple requests for segments of the file via HTTP "range" requests. It then reassembles the file on disk prior to installing the patch.

Unfortunately, in Squid's default configuration, the parameter "range_offset_limit" is set to 0 KB, which means that requests for subranges of files (that is, from byte N to byte M) are not cached. Thus, while one might think that it would help with the problem with no special configuration, Squid "out of the box" does not cache Windows Update files. It does nothing to reduce the waste or solve the bandwidth starvation problems caused by Windows Update.

Attempts to make things better can make them much worse

It is possible to change the Squid configuration to allow Squid to cache range requests. However, the semantics of the tuning parameters currently provided by Squid to do so are not well designed to deal with the problem.

For example, if one sets the "range_offset_limit" parameter to a number greater than zero, Squid will -- instead of merely passing the range request on verbatim -- fetch all of the file up to and including the requested range. However, once the client's request has been satisfied and the client disconnects, Squid's "quick abort" mechanism is likely to be triggered (unless the end of the subrange just happens to be very close to the end of the file). Thus, setting "range_offset_limit" to a fairly large number (or -1, which causes the entire file to be fetched) does not just fail to improve efficiency; it actually makes things much worse. As Windows Update tries to retrieve each segment of the file, the cache re-fetches the whole file up to and including the most recently requested range. It then drops the connection (unless the transfer is near the very end) and discards all of the data that one might have hoped it would cache. The next transfer proceeds from the beginning of the file again.

The result: the total number of bytes transferred, instead of being N (where N is the length of the file), becomes the Nth number in a Fibonacci series. (We've seen this "pessimization" effect in caches where this one parameter has been altered in an attempt to cache updates.)

It is unclear to this author why the "range_offset_limit" parameter is defined in the way it is, since I can think of no case in which it's desirable to fetch the entire file up to and including a range if the results are overwhelmingly likely to be discarded immediately thereafter. (Perhaps it's meant to deal with servers that don't implement range requests.) It's also unclear why the parameters whose names begin "quick_abort" are applied to such fetches at all, since the whole point of prefetching the file is, well, to prefetch it even after the client has gotten the range of bytes it wants.

In any event, to prevent the cache from stopping its retrieval of the file and then starting again from scratch, one can set the "quick_abort_min" parameter to -1. This causes Squid to respond to a range request by trying to retrieve and cache the entire file.

Unfortunately, this causes yet another problem to surface. If the file in question is larger than "maximum_object_size" (the maximum object size defined for the cache), Squid retrieves the entire file in response to each request but then fails to store it! So, again, our attempts to make Squid behave more efficiently can make it behave even less so. We have "pessimal" behavior: Squid fetches the entire file (which is likely to be very large, because it exceeds the maximum object size) on each subrange request, and then discards the whole thing. Assuming a fixed range size, the amount of data retrieved is now of the order of N squared, where N is the size of the file.

The most obvious workaround for this problem is to increase Squid's "maximum_object_size" parameter. But this is problematic for two reasons. First, the size of Microsoft's bugs (and, hence, its patches) seems to be increasing dramatically over time. A few months ago, this author set an experimental cache to a maximum object size of 384 MB, thinking that surely Microsoft would not release a patch larger than that. But Microsoft quickly proved him wrong. Just this week, a patch totaling more than 723MB -- about the capacity of a CD -- caused that server to revert to pessimal behavior. While one might hope that Microsoft would limit its patches to the size of a CD (or break it into files that are no larger than one CD), we cannot count on this. The day may soon come when it's expected that computers come with DVD-ROM drives, allowing single files that are gigabytes in size.

Secondly, it is probably not desirable to raise the maximum object size of the entire cache -- and possibly cause the caching of other large objects that one has no desire to cache -- just for the sake of caching Windows Update. (After all, another use of range requests is for random access -- to allow a client to fetch just the portion of a large database that it actually needs. If the database is frequently updated, and Squid tries to prefetch the whole file, it could find that copy to be stale almost immediately. This would, ironically, cause the same waste of bandwidth we now see with Windows Update.) Unfortunately, Squid doesn't have the ability to apply ACLs to the "maximum_object_size" parameter, as it does to some others (such as "tcp_outgoing_address"), so it's all or nothing.

Because of the failure modes mentioned above -- all of which result in network congestion and huge amounts of wasted bandwidth -- institutional users of Squid and ISPs have two choices other than setting up SUS or WUS (which ISPs and companies without expensive Windows servers can't do). Neither of these choices is attractive, but for the moment they are the only two available workarounds.

The first is to leave "range_offset_limit" at zero and forego any caching of Windows Update. (Unfortunately, this option allows a collection of Windows machines to effectively become a "zombie army" that can cripple an Internet feed with no notice.) The second is to create an extremely large cache, with an extremely high "maximum_object_size" parameter, and set "range_offset_limit" to -1 and "quick_abort_min" to 0. Such a cache will cache many things that one would not ordinarily desire to cache. (The author has seen at least one case in which 7 hours of streaming media -- it looked like it might have been MP3 delivered via SHOUTcast -- wound up in the cache, consuming more than 400 megabytes. If anyone knows of ACLs that can be used to prevent this from occurring, I'd like to know about them.)

Proposed solutions

The following improvements to Squid could greatly improve administrators' ability to cache Windows Update (and other similar updates) without the deleterious effects mentioned above.

1. Modify Squid so that before it attempts to prefetch an entire object in response to a range request, it first checks the size of the object against "maximum_object_size". It should also perform all the other checks it normally performs to determine whether an object is cacheable (including evaluation of the ACLs, if any, used with the "no_cache" parameter). If the entire file is not cacheable for any reason whatsoever, the range request should be passed through verbatim. (This seems like common sense, but it is apparently not done now.)

2. Allow the "maximum_object_size" parameter to be selected, for each transfer, via an ACL. (Fortuitously, the standard syntax of the Squid configuration file both allows this and provides backward compatibility in the case where no ACL is specified. See the "tcp_outgoing_address" parameter, which acquired the ability to use ACLs only recently and is backward compatible with configuration files that don't exploit this new capability.) With this modification, an administrator could allow the caching of large Windows Update files but not large files from other sources.

3. If a range transfer is to be expanded into a transfer of the entire file, exempt the transfer from the "quick_abort" mechanism. (After all, it's normal for the client to disconnect after it receives the data it has requested.)

4. Encourage Microsoft to modify Windows Update so that it can "discover" a server on which updates are preloaded or cached. Currently, SUS and WUS require modification to a client machine's registry; this is practical for organizations with IT staffs but not for ISPs. An ISP should be able to run a Web cache, FTP server, or Web server to which Windows updates are downloaded once and then distributed downstream. Microsoft has a financial incentive to do this, because its updates are currently distributed through Akamai (which undoubtedly charges it by the bit for downloads). Alas, we can't hold our breath waiting for Microsoft to do such a thing. Therefore, the modifications to Squid mentioned above are essential to providing an efficient solution -- not only to Windows Update issues but also to issues with similar updating systems from Intuit and other software vendors.

The first three of these items should be implemented as soon as possible, so that administrators of Squid caches can safely cache Microsoft's updates. Now that the largest of these have grown to more than 700 megabytes, the need is urgent.

--Brett Glass



[Index of Archives]     [Linux Audio Users]     [Samba]     [Big List of Linux Books]     [Linux USB]     [Yosemite News]

  Powered by Linux