Re: Database versus filesystem for storing images

"Dawid Kuroczko" <qnex42@xxxxxxxxx> · Sun, 7 Jan 2007 01:46:46 +0100

On 1/5/07, Jorge Godoy <jgodoy@xxxxxxxxx> wrote:
Andrew Chernow <pg-job@xxxxxxxxx> writes:

>> And how do you guarantee that after a failure?  You're restoring two
>> different sets of data here:
>
>> How do you link them together on that specific operation?  Or even on a daily
>> basis, if you get corrupted data...
>
> I answered that already.

I'm sorry.  It must be the flu, the pain or something else, but I really don't
remember reading your message about how you can be 100% sure that all
references to the filesystem have their corresponding files present and also
all present files have their respective database entry.

By designing the system so that you can be sure.  For instance delegate
removing data from filesystem to a dedicated queue table within database,
and carefully check that code.  Let no other software delete data.

If you need PITR, you can stop the remover utility during backups (think of
it as a VACUUM for filesystem ;)).

I've seen HA measures (I don't imagine anyone sacrificing their customers
copying 3 billion files and a few hundred terabytes while still maintaining an
adequate service rate with part of its infra-structure down, just to use your
example to that answer...), ideas about requiring an answer from the
filesystem before considering the transaction done DB-wise (who grants you
that the image really went to the disk and is not on cache when the machine
has a power failure and shuts down abruptly?)...

And who grants you that the WAL log file really went to the disk and is not on
cache when the machine has a power failure and shuts down abruptly?

Use a trustworthy hardware.  You would have to do it anyway, if you wanted
to go with "all-in-DB" approach.

> Some people have seen this as a disadvantage on this thread, I personally
> don't see it that why.

I am questioning two points that show two situations where it is bad.
Specially if those images are important to the records (e.g. product failure
images, prize winning images, product specs, prototype images, blueprints --
after all, we don't need to restrict our files to images, right? --,
agreements, spreadsheets with the last years of company account movements,
documents received from lawyers, etc.).

I don't think noone is saying that storing images in DB isn't better from
data integrity point of view.  But it has drawbacks, which sometimes make
pepople store their images elsewhere in real life.

Of course if one had infinite budget...  But other than that, if you are storing
lots of data, and you can afford a trade-off between data safety (that you
can loose some data or that your data can be bloated by dead data) and
costs, then you have to consider storing data otherwise.

> I guess it depends on access needs, many files and how much data you have.
> What if you had 3 billion files across a few hundred terabytes?  Can you say
> with experience how the database would hold up in this situation?

I'd have partitioning if I had a case like that.  Part of those would be
delegated to one machine, part to another and so on.  Even if that solution --
partitioning -- makes the overall MTBF lower...

And how do you handle data integrity between many machines? The answer
is of 2PC, I guess.  But still, managing integrity between many machines is
also hard -- remember, you don't have RI constraints for remote data.

And I still can't imagine how you guarantee that all 3 billion files have
their corresponding entries on the database.  Couting them is not enough since
I can have one file with the wrong "name" present on the filesystem or some
duplicate record on the DB...

Depends what you mean by integrity.

For my needs it is sufficient that I can guarantee that every object mentioned
in a database is present on the filesystem.  Orphaned files are not a problem
and can be removed (if there really is need to) by a monthly cron job ("find
old files not mentioned in DB", "find not accessed old files and check if they
are in DB", etc. etc.).

More important still, is that I can make a snapshot of data.  With filesystem
it would be just:
 stop remover utility
 pg_dump  &  tar (or whatever) the files
 start remover utility

With DB, one would have to start pg_dump and wait until it finishes.
And observe
how your tables start to bloat as the data needs to be preserved for
data integrity
reasons.  How your database is more and mroe filled with dead tuples.  And how,
after it finishes VACUUM struggles to clean up the tables.  The more data you,
have, the more frequently it changes, the bigger problem it is.
Filesystem is simply
more efficient at storing data (the non-transactionness, and limited
metadata being
the tradeoff).

I don't say "all in DB" is fundamentally wrong.  It's just that its
niches are "low
bandwidth services", like intranets or rarely visited data, some forms of "very
important data store" services, where one must be absolutely sure about
safety and integrity, and finally, let's call it "academic/hobbyst research" ;-)

files outside of DB are where one must squeeze performance out of hardware
and it comes at a price.  If you can afford the price, you get the prize. ;)

  Regards,
       Dawid