Re: transparent squid + clamav + https

Henrik K <hege@xxxxxxx> · Wed, 17 Mar 2010 06:16:46 +0200

On Wed, Mar 17, 2010 at 03:53:01AM +0000, Amos Jeffries wrote:
> 
> So HAVP is designed specifically to send client scanned parts of the file
> before the entire thing is checked?

Right. Of course all this is configurable.

> That explains something that I was wondering about...
> 
> Consider this scenario which I have seen in the wild:
> 
> Background: Clients visit website A and fetch a large document PDF file.
> Unknown to the website author the server has been infected and PDF is one
> of the files which get a macro virus appended. The CRM system records in a
> database the upload and change times for efficient if-modified responses.
> The server admin is alerted and runs a virus scan, cleaning the files some
> time later. The CRM database gets omitted from the update.
> 
> Imagining that HAVP was in use in a proxy between this site and a user...
> 
> During the infected period imaginary-HAVP scans the documents and sends a
> large "clean" prefix to all visitors.
>  BUT... aborts when the appended infection is detected. Browser is lucky
> enough to notice the file is incomplete and retires later with a range
> request for the missing bit.
> 
>  a) during the infected period the fetched ranges will never succeed.

HAVP doesn't allow Range requests by default anyway, always forcing a full
download.

>  b) after the infection is cleaned up the file will pass through
> imaginary-HAVP and client will get a truncated version. With complete-file
> being indicated.
>
> This is where the problem comes in. Being a macro infection one of the
> changes to the file was that the virus appended some undetectable jump code
> at the beginning to go with the virus at the end.
> 
> We are left with the situation where intermediary proxies are holding
> corrupted files (first part being original infected with jump, followed by
> terminal bytes of teh file. the server is left with a pristine and working
> file. New visitors loading it will be fine, and sill analysts coming along
> later.
> 
> However for clients visiting through one of the proxies which cached the
> file meanwhile ... One of two things will happen to depending on the file
> viewer used:
>  1) dumb viewer will try to run the random part of file (now text!) where
> virus inserted itself as binary code and crash.
>  2) smart viewer will notice the missing/corrupt macro (its past the end
> of file maybe) and display the file without running it. However, even then
> there is a discrepancy in file prefix and some of the content appears
> corrupted.
> 
> This type of traffic is the #1 reason for buffering until fully processed.
> I do like the idea of incremental scanning as it arrives though. That will
> at least reduce the delays to very little more than the total receiving
> time.

The recommended configuration is not to cache between client and HAVP. It
comes at a small penalty of scanning files every time, but also has a bonus
of detecting viruses that are in cache but hadn't signatures at that time.

http://havp.hege.li/forum/viewtopic.php?f=2&t=11

So I'm not sure your worries apply.. please clarify if I didn't understand.