Re: Storing many big files in database- should I do it?

Rod <cckramer@xxxxxxxxx> · Tue, 27 Apr 2010 18:58:30 +1000

No, I'm not storing RDBMS in S3. I didn't write that in my post.
S3 is used as CDN, only for downloading files.

On Tue, Apr 27, 2010 at 6:54 PM, John R Pierce <pierce@xxxxxxxxxxxx> wrote:
> Rod wrote:
>>
>> Hello,
>>
>> I have a web application where users upload/share files.
>> After file is uploaded it is copied to S3 and all subsequent downloads
>> are done from there.
>> So in a file's lifetime it's accessed only twice- when created and
>> when copied to S3.
>>
>> Files are documents, of different size from few kilobytes to 200
>> Megabytes. Number of files: thousands to hundreds of thousands.
>>
>> My dilemma is - Should I store files in PGSQL database or store in
>> filesystem and keep only metadata in database?
>>
>> I see the possible cons of using PGSQL as storage:
>> - more network bandwidth required comparing to access NFS-mounted
>> filesystem ?
>> - if database becomes corrupt you can't recover individual files
>> - you can't backup live database unless you install complicated
>> replication add-ons
>> - more CPU required to store/retrieve files (comparing to filesystem
>> access)
>> - size overhead, e.g. storing 1000 bytes will take 1000 bytes in
>> database + 100 bytes for db metadata, index, etc. with lot of files
>> this will be a lot of overhead.
>>
>> Are these concerns valid?
>> Anyone had this kind of design problem and how did you solve it?
>>
>
> S3 storage is not suitable for running a RDBMS.
> An RDBMS wants fast low latency storage using 8k block random reads and
> writes.  S3 is high latency and oriented towards streaming
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general