Re: Re: mod_cache intermittently corrupting PDF's or storing incomplete version of file.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Me again :)

Some progress, hopefully this might be of use to anyone else who runs
into this problem.


I have managed to replicate the issue 100% of the time by using the
'--limit'  parameter in wget.

This produced some very interesting results.

When limiting the request connection rate to 20 kbps, and requesting
the PDF, the file would fail at around 22% of downloading, and
returned the following in Apache error log with debug on.

snip
[Tue Jul 17 11:18:50 2007] [debug] mod_disk_cache.c(1043): disk_cache:
Body for URL http://sitename/mypdf.pdf? cached.
[Tue Jul 17 11:18:50 2007] [debug] mod_proxy_http.c(1537): proxy: end body send
[Tue Jul 17 11:18:50 2007] [debug] proxy_util.c(1816): proxy: HTTP:
has released connection for (*)
<<snip

The failed request would then remain in the cache.

When forcing a cache refresh by updating the Last Modified time, and
not limiting the connection the entire file was downloaded, but not
held in cache due to ..

snip
7 11:30:10 2007] [debug] mod_disk_cache.c(1007): cache_disk: URL
http://sitename/mypdf.pdf?  failed the size check (1000872 > 1000000)
<<snip

The was quickly resolved by setting the CacheMaxFileSize to 10MB.

After this, Apache was happy to serve the Entire PDF at 20kbps if the
file was present in cache, but suffers from the same problem if the
file is not in cache and has to goto Tomcat.

What is interesting is when checking the Tomcat access log, there is a
delay between my request, and the log entry, it appears that Tomcat is
deciding how much of the file to send depending on my connection
speed, when changing my connection speed Tomcat changed the amount
request size in the log entry, and the connection would fail when
hitting the amount number of bytes displayed in the access log.

All I need to do now is stop Tomcat from attempting to do the byte
range server, and I think the issue will be resolved.



Regards,

Mark.























On 17/07/07, Mark Stevens <mark.stevens99@xxxxxxxxxxxxxx> wrote:
Hi Jacqui,

Thanks for the response,

Initially I suspected the issue could have been related to client
type, however I was able to create a broken item in the cache by
running the following command from a remote server.

  - wget -S --no-cache http://sitename/mypdf.pdf?<random number>'

and then kept changing the random number until I received HTTP 416
(Requested range not satisfiable)

On getting the 416 response, the item would remain in the cache
smaller than expected size when attempting to view via browser,
response is 'the file is damaged and could not be repaired'.

Being I was testing against the live site, it is possible someone had
send a bytes range request during my testing with wget, and then Maybe
Apache stored the bytes range as the entire item.

I'll continue testing and let you know how I get on, if I don't get a
resolve soon, I will try rolling back to Apache 1.3 to rule out
mod_cache as being the culprit.

Thanks again,

Mark.













On 17/07/07, Jacqui caren <jacqui.caren@xxxxxxxxxxxx> wrote:
> Mark Stevens wrote:
> > Anyone?
>
> It is likely that PDF viewers will ask for byteranges.
>
> If the cache is storing what is requested rather that the entire
> file, then this make sense. IIRC mod_proxy does the correct thing
> (requests the byterange it does not have and put chunks together then
> serves the requested range).
>
> PDFs were designed so that the TOC is at the head of the document.
> If you find that you are only storing the first NNNN bytes
> and then only sporadic contiguous chunks I would assume
> byterange requests are the problem and hand code a number
> of test requests to confirm it.
>
> HTH
>
> > On 16/07/07, Mark Stevens <mark.stevens99@xxxxxxxxxxxxxx> wrote:
> >
> >> Has anyone had problems in the past with Apache mod_cache storing
> >> incomplete versions of files such as PDF's, and if so did you manage
> >> to resolve it?
> >>
> >> The problem is intermittent, and I can confirm PDF's from the origin
> >> source are OK.
> >>
> >> I would be interested in any combination of setup and version of
> >> Apache you may have seen this with.
> >>
> >> I posted something related to this issue regarding removal of
> >> individual files from cache, sorry if this is seen as double posting,
> >> but felt I ought to have been more direct.
> >>
> >>
> >> Many thanks in advance.
> >>
> >> Mark.
> >>
>
> ---------------------------------------------------------------------
> The official User-To-User support forum of the Apache HTTP Server Project.
> See <URL:http://httpd.apache.org/userslist.html> for more info.
> To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx
>    "   from the digest: users-digest-unsubscribe@xxxxxxxxxxxxxxxx
> For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx
>
>


---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx
  "   from the digest: users-digest-unsubscribe@xxxxxxxxxxxxxxxx
For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx


[Index of Archives]     [Open SSH Users]     [Linux ACPI]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Squid]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux