On Tue, Mar 4, 2014 at 11:01 AM, Bastien Koert <phpster@xxxxxxxxx> wrote: > Use the filesystem and store each pdf on the file system > > you can use a cron to delete the files older than x > Thanks for the suggestion- sounds like that might be the simplest approach. > Is each PDF substantially different or just some pertinent details like > customer name, address etc? Could you template that so that you are > generating a minimal number of PDFs? > The generation is handled by a 3rd party black box application. Each document is substantially different from others- they are chemical safety datasheets and hence are specific to the particular chemicals they represent. (If you are curious/interested, this page from Dow Jones Chemicals has a brief explanation: http://www.dow.com/productsafety/safety/sds.htm) > How often does a customer come in and request a new PDF? > This is a new system so it is kind of hard to say. Some chemicals, we might anticipate requests several times a week (perhaps several times a day). Others, very rarely- as little as once ever. One thing we had considered is creating an SDS hit tracker which could sort of scale the relative importance of a particular file- then when the cron job comes around it could take that into consideration. I am not really sure if we would see substantial benefit over just a simple find based cron job. Customers are not the PM's concern it is the web spiders. > > > On Tue, Mar 4, 2014 at 1:09 PM, George Wilson <rmwscbt@xxxxxxxxx> wrote: > >> Greetings all, >> I hope this is not tip toeing off topic but I am working on solving a >> problem at work right now and was hoping someone here might have some >> experience/insight. >> >> My company has a new proprietary server which generates pdf chemical >> safety >> files via a rest API and returns them to the user. My project manager >> wants >> a layer of separation between the website(and hence user) and the document >> server so I wrote an intermediary script which accepts a request from the >> website and attempts to grab a pdf from the document server via the php >> curl system. That appears to be working well. >> >> Here is the issue I am trying to solve: >> >> We must assume a total of 1.4 million possible documents which may be >> generated by this system- each initiated directly from our website. Each >> document is estimated to be about a megabyte in size. Generating each one >> takes at least a few seconds. >> >> We are interested in setting up some kind of document caching system >> (either a home brewed php based system or a system that generates the >> files, saves them and periodically deletes them). My project manager is >> concerned about web crawlers kicking off the generation of these files and >> so we are considering strategies to avoid blowing out our server >> resources. >> >> Does anyone have any suggestions or have you dealt with this problem >> before? >> >> Thank you in advance >> > > > > -- > > Bastien > > Cat, the other other white meat >