On Wed, Mar 17, 2010 at 03:53:01AM +0000, Amos Jeffries wrote: > > So HAVP is designed specifically to send client scanned parts of the file > before the entire thing is checked? Right. Of course all this is configurable. > That explains something that I was wondering about... > > Consider this scenario which I have seen in the wild: > > Background: Clients visit website A and fetch a large document PDF file. > Unknown to the website author the server has been infected and PDF is one > of the files which get a macro virus appended. The CRM system records in a > database the upload and change times for efficient if-modified responses. > The server admin is alerted and runs a virus scan, cleaning the files some > time later. The CRM database gets omitted from the update. > > Imagining that HAVP was in use in a proxy between this site and a user... > > During the infected period imaginary-HAVP scans the documents and sends a > large "clean" prefix to all visitors. > BUT... aborts when the appended infection is detected. Browser is lucky > enough to notice the file is incomplete and retires later with a range > request for the missing bit. > > a) during the infected period the fetched ranges will never succeed. HAVP doesn't allow Range requests by default anyway, always forcing a full download. > b) after the infection is cleaned up the file will pass through > imaginary-HAVP and client will get a truncated version. With complete-file > being indicated. > > This is where the problem comes in. Being a macro infection one of the > changes to the file was that the virus appended some undetectable jump code > at the beginning to go with the virus at the end. > > We are left with the situation where intermediary proxies are holding > corrupted files (first part being original infected with jump, followed by > terminal bytes of teh file. the server is left with a pristine and working > file. New visitors loading it will be fine, and sill analysts coming along > later. > > However for clients visiting through one of the proxies which cached the > file meanwhile ... One of two things will happen to depending on the file > viewer used: > 1) dumb viewer will try to run the random part of file (now text!) where > virus inserted itself as binary code and crash. > 2) smart viewer will notice the missing/corrupt macro (its past the end > of file maybe) and display the file without running it. However, even then > there is a discrepancy in file prefix and some of the content appears > corrupted. > > This type of traffic is the #1 reason for buffering until fully processed. > I do like the idea of incremental scanning as it arrives though. That will > at least reduce the delays to very little more than the total receiving > time. The recommended configuration is not to cache between client and HAVP. It comes at a small penalty of scanning files every time, but also has a bonus of detecting viruses that are in cache but hadn't signatures at that time. http://havp.hege.li/forum/viewtopic.php?f=2&t=11 So I'm not sure your worries apply.. please clarify if I didn't understand.