Re: Large(ish) scale pdf file cacheing

Bastien Koert <phpster@xxxxxxxxx> · Tue, 4 Mar 2014 14:01:25 -0500

Use the filesystem and store each pdf on the file system

you can use a cron to delete the files older than x

Is each PDF substantially different or just some pertinent details like
customer name, address etc? Could you template that so that you are
generating a minimal number of PDFs?

How often does a customer come in and request a new PDF?

On Tue, Mar 4, 2014 at 1:09 PM, George Wilson <rmwscbt@xxxxxxxxx> wrote:

> Greetings all,
> I hope this is not tip toeing off topic but I am working on solving a
> problem at work right now and was hoping someone here might have some
> experience/insight.
>
> My company has a new proprietary server which generates pdf chemical safety
> files via a rest API and returns them to the user. My project manager wants
> a layer of separation between the website(and hence user) and the document
> server so I wrote an intermediary script which accepts a request from the
> website and attempts to grab a pdf from the document server via the php
> curl system. That appears to be working well.
>
> Here is the issue I am trying to solve:
>
> We must assume a total of 1.4 million possible documents which may be
> generated by this system- each initiated directly from our website. Each
> document is estimated to be about a megabyte in size. Generating each one
> takes at least a few seconds.
>
> We are interested in setting up some kind of document caching system
> (either a home brewed php based system or a system that generates the
> files, saves them and periodically deletes them). My project manager is
> concerned about web crawlers kicking off the generation of these files and
> so we are considering strategies to avoid blowing out our server resources.
>
> Does anyone have any suggestions or have you dealt with this problem
> before?
>
> Thank you in advance
>

-- 

Bastien

Cat, the other other white meat