Re: Storing images in PostgreSQL databases (again)

"Merlin Moncure" <mmoncure@xxxxxxxxx> · Thu, 5 Oct 2006 10:18:51 -0400

On 10/4/06, Guy Rouillier <guyr@xxxxxxxxxxx> wrote:
TIJod wrote:
> I need to store a large number of images in a
> PostgreSQL database. In my application, this
> represents a few hundreds of thousands of images. The
> size of each image is about 100-200 Ko. There is a
> large turnover in my database, i.e. each image stays
> about 1 week in the database, then it is deleted.

I see little value to storing the images in the database.  For me that's
a general statement (I'm sure others will disagree); but especially in
your case, where you have a high volume and only want to store them for
a couple days.  Why incur all the overhead of putting them in the DB?
You can't search on them or sort on them.  I would just store them in
the file system and put a reference in the DB.

no, you can't search or sort on them but you can put metadata on
fields and search on that, and you can do things like use RI to delete
images that are associated with other things, etc.  this would
probably fit the OP's methodogy quite nicely.

> but this wouldrequire a more tricky implementation, and ACID-ity
> would be difficult to ensure -- after all, a database
> should abstract the internal storage of data, may it
> be images).

I can't get excited about this.  First, given the amount of overhead
you'll be avoiding, checking the return code from storing the image in
the file system seems relatively trivial.  Store the image first, and if
you get a failure code, don't store the rest of the data in the DB;
you've just implemented data consistency.  That assumes, of course, that
the image is the only meaningful data you have, which in most situations
is not the case.  Meaning you'd want to store the rest of the data
anyway with a messages saying "image not available."

i think this topic is interesting and deserves better treatment than
assumptions. postgresql will toast all images over a cerain size which
is actually pretty efficient although can be a problem if your images
are really big.  on the downside you have more vacuuming overhead and
postgresql can't match filesystem speed for raw writing.  also you can
pretty much forget decent performance if your code that does the
actual insertion is not in c/c++ and uses the paramaterized api.

on the flip side, you have a central interface, single point of
failure and you don't have to deal with thousands or millions of image
files which can become it's own problem (although solvable). also you
don't have to write plumbing code to get something like atomicity.
PostgreSQL is getting more and more efficeint at moving large streams
in and out of the database and the answer here is not as cut and try
as you might think (historically, it was insane to even attempt it).

i'm wondering if anybody has ever attempted to manage large
collections of binary objects inside the database and has advice here.

merlin