On Fri, May 27, 2011 at 9:15 AM, Emi Lu <emilu@xxxxxxxxxxxxxxxxx> wrote:
Solution:
====================
(1) Save pdfs to file system, only point file name in psql8.3
(2) Save oids of pdfs into table
(3) Save pdf files as bytea column in psql8.3
Pros and cons for (1), (2), (3), which is the most efficient way?
You asked about the "most efficient" but you didn't tell us what you are doing with the pdf files. It really doesn't matter how you store it if you are simply inserting a 500KB object 30 times per year - then it becomes all about convenience. I'd probably rule out solution #1 in that case purely from a maintenance perspective. Storing the PDFs in the db eliminates the maintenance work of ensuring that the files are archived along with the db backups, that file paths stay correct whenever a new db host is provisioned, and storing the files in the db provides transaction-safe file storage. If, on the other hand, you are selecting/updating the rows in question several hundred times per second or more, then you may well want to put some thought into efficiency and, assuming some kind of web access, providing your webserver/caching layer with direct access to the files on the filesystem for service efficiency. But if you are only occasionally accessing the files in question, there's little reason not to put them in the db.
If mybatis can't load the object that references the file without also loading the entire file from the db - and you are using that object for other things that will require frequent loading/storing - then you probably don't want the files in the db. If it can load the object without loading the binary data, by lazily loading that column only when requested, then it doesn't matter. Using BLOBs guarantees that you can access the entire row without loading the binary data if mybatis exposes the blob separately from the rest of the object, but mybatis may be capable of issuing a select without that column and then grabbing that column as needed in order to simulate that in the case of a bytea column, anyway. Internally, a large bytea column is treated similarly to a blob, with any data over a certain size not stored in-line with the rest of the row for efficiency reasons.