Search Postgresql Archives

Re: PDF files: to store in database or not

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/06/2016 01:30 PM, Rich Shepard wrote:
With no experience of storing binary data in a bytea column I don't know
when its use is appropriate. I suspect that for an application I'm
developing it would be better to store row-related documents outside the
database, and want to learn if that is the appropriate approach.

  Consider an application that manages a fleet of vehicles. There's a
Vehicles table with information on each one (perhaps make, model, VIN, year of purchase) and a Services table. There are many PDF documents associated with each row in the tables: purchase contract, insurance form, service and
maintenance records, etc.

My thinking is to not store these documents in the database, but to store
them in subdirectories outside the database.

  Your thoughts?

Rich
I'd also be interested in answers to this that are give specific pros and cons, and not in terms of "its better to do this than that." What's "better" depends on how much you value the various pros and the cons.

One of the pros of keeping them in the database is ease of protecting adds and updates to the files and their related data with a transaction and being able to have system where it iss pretty much impossible for the documents to ever be out of sync with the related data.

I maintain some systems that do keep the documents outside of the database, and the application code maintains the transactional integrity of the files and data, and for the most part we don't have integrity problems. In the worst of an add or update operation being interrupted by a system crash or unexpected error, we have a new document saved but the data about this document has not been written to the database and it is as if that operation never happened. The file may really be there but the system does not "know about it." This works even for updates because our system versions documents and the old version is not written over, there is simply a new version that the system never "knows" about. Without versioning this would be more of a problem, and you would probably need to protect yourself with code that does something like temporarily keeping the last version of a file during an update and switching over the metadata to reference the new document only at the very last operation in the transaction.

We also have the potential of the database not matching the file store when a system is migrated or "cloned." We are very careful about this, but we've at least once had a case where a client's IT depart screwed it up, and got a mismatched system to which they started writing new data. Luckily this was a test or staging system and no production data was lost.

I've often wondered if we'd have been better off storing the files in the database. This design decision was made some years ago, and our concerns around this had to do with performance, but I don't know that we had any real data that this should have been a concern, and I suspect you could ameliorate if not eliminate this as an issue by careful design. I'd loved to hear this idea confirmed or debunked by someone who has more expertise (and ideally, done actual testing).

Cheers,

Eric


--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux