On 10/5/06, Alexander Staubo <alex@xxxxxxxxxxxxxxx> wrote:
On Oct 5, 2006, at 16:18 , Merlin Moncure wrote: > i'm wondering if anybody has ever attempted to manage large > collections of binary objects inside the database and has advice here. We have a production system containing 10,000 images (JPEG and PNG of various sizes) totaling roughly 4GBs. We have Lighttpd running against a couple of Rails processes which crop, scale and convert images on the fly using ImageMagick; converted images are cached in the file system and subsequently served directly by Lighttpd. Functionally I have absolutely no quibbles with this system; PostgreSQL stores the data smoothly and everything works as designed. Performance-wise, I'm not sure; the amount of data seems to put a certain load on the database server, though it's impossible to tell how much. Backups are hell, taking hours and hours to do just a
i admit, backups could be a problem. maybe pitr is the answer. (dump style backups are a problem for any big database)
single dump of the database. Rails' PostgreSQL adapter uses SQL for inserts and quotes every byte as an octal escape sequence; storing a
ouch...the only way to do this quickly imo is to send in raw binary data directly to the database using parameterized...this eliminates both the escaping and the unescaping step. likewise the data should be pulled out binary (this will liekly be several times faster).
single image can take several seconds. Single-image retrieval is similarly slow, but since the adapter uses bindings that talk directly to libpq4, I believe it's caused by the overall load on the database. Because of this, we see no recourse but to move the images into the file system. Since our cluster consists of three separate machines all running the same Rails application, with no dedicated box handling the image storage, such a solution requires the use of NFS or other type of shared storage for centralized image storage; we're not sure yet about what we will end up with.
cant fault you for that decision, web applications are usually pretty aggressive on caching. they also ususally fit pretty well in the various replication technlogies as well...something to consider. merlin