On 10/4/06, Guy Rouillier <guyr@xxxxxxxxxxx> wrote:
TIJod wrote: > I need to store a large number of images in a > PostgreSQL database. In my application, this > represents a few hundreds of thousands of images. The > size of each image is about 100-200 Ko. There is a > large turnover in my database, i.e. each image stays > about 1 week in the database, then it is deleted. I see little value to storing the images in the database. For me that's a general statement (I'm sure others will disagree); but especially in your case, where you have a high volume and only want to store them for a couple days. Why incur all the overhead of putting them in the DB? You can't search on them or sort on them. I would just store them in the file system and put a reference in the DB.
no, you can't search or sort on them but you can put metadata on fields and search on that, and you can do things like use RI to delete images that are associated with other things, etc. this would probably fit the OP's methodogy quite nicely.
> but this wouldrequire a more tricky implementation, and ACID-ity > would be difficult to ensure -- after all, a database > should abstract the internal storage of data, may it > be images). I can't get excited about this. First, given the amount of overhead you'll be avoiding, checking the return code from storing the image in the file system seems relatively trivial. Store the image first, and if you get a failure code, don't store the rest of the data in the DB; you've just implemented data consistency. That assumes, of course, that the image is the only meaningful data you have, which in most situations is not the case. Meaning you'd want to store the rest of the data anyway with a messages saying "image not available."
i think this topic is interesting and deserves better treatment than assumptions. postgresql will toast all images over a cerain size which is actually pretty efficient although can be a problem if your images are really big. on the downside you have more vacuuming overhead and postgresql can't match filesystem speed for raw writing. also you can pretty much forget decent performance if your code that does the actual insertion is not in c/c++ and uses the paramaterized api. on the flip side, you have a central interface, single point of failure and you don't have to deal with thousands or millions of image files which can become it's own problem (although solvable). also you don't have to write plumbing code to get something like atomicity. PostgreSQL is getting more and more efficeint at moving large streams in and out of the database and the answer here is not as cut and try as you might think (historically, it was insane to even attempt it). i'm wondering if anybody has ever attempted to manage large collections of binary objects inside the database and has advice here. merlin