Search squid archive

Re: I would like to use Squid for caching but it is imperative that all files be cached.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks for your reply Amos.

The tests are a suite of largely accessibility tests with some usability tests for web pages and other documents. Some are based on open source software, some are based on published algorithms, others (the problematic ones) are compiled executables. The tests are generally originally designed to test a single web page. I am however attempting to test entire large websites e.g. government websites or websites of large organisations. Data is to be collated from all tests on all web pages and other resources tested. This data is to be used to generate a report about the whole website not just individual pages.

Tests are largely automatic with some manual configuration of cookie and form data etc. They run on a virtual server. The virtual server is terminated after one job and only the report itself is kept. All runtime data including any cache is not retained after the one and only job.

A website e.g. that of a news organisation, can change within the time it takes to run the suite of tests. I want one static snapshot of each web page, one for each URL, to use as a reference and not have different tests reporting on different content for the same URL. I keep a copy of the web pages for reference within the report. (It would not be appropriate to keep multiple pages with the same URL in the report.) Some of the tests fetch documents linked to from the page being tested; therefore it is not possible to say which test will fetch a given file first.

Originally I thought of downloading the files once writing them to disk and processing them from the local copy. I even thought of using HTTrack ( http://www.httrack.com/ ) to create a static copy of the websites. The problem with both these approaches is that I lose the HTTP header information. The header information is important as I would like to keep the test suite generic enough to handle different character encoding, content language and make sense of response codes. Also some tests complain if the header information is missing or incorrect.

So what I really want is a static snapshot of a dynamic website with correct HTTP header information. I know this is not what Squid was designed for but I was hoping that it it would be possible with Squid. Thus I thought I could use Squid to cache a static snapshot of the (dynamic) websites so that all the tests would run on the same content.

Of secondary importance is that the test suite is cloud based. The cloud service provider charges for bandwidth. If I can reduce repeat requests for the same file I can keep my costs down.



[Index of Archives]     [Linux Audio Users]     [Samba]     [Big List of Linux Books]     [Linux USB]     [Yosemite News]

  Powered by Linux