On Sun, Jun 12, 2011 at 11:00 AM, Hanno Schlichting <hanno@xxxxxxxxxxx> wrote: > Hi. > > I'm a new Postgres user. If I happen to ask stupid questions please > feel free to point me to any documentation I should read or guidelines > for asking questions. > > I'm looking into storing binary data in Postgres and trying to > understand how data is actually stored in the database. The dataset > I'm looking at is images, photos, pdf documents which should commonly > be at a minimum 100kb, on average 10mb and can scale up to 100mb for > each document. I want to store this data in the database, as I need > transactional integrity and want to avoid the extra complexity of > managing shared filesystems between a number of frontend application > servers and database backends. > > The binary data will only be accessed as a whole. So either a complete > new file is written to the DB or a complete file will be read and > cached on a frontend server. I don't need streaming access or be able > to stream partial data. The use-case seems to be well supported by > Oracle 11g with the introduction of "secure files" (pdf: [1]). > > But from what I read of Postgres, my best bet is to store data as > large objects [2]. Going all the way down this means storing the > binary data as 2kb chunks and adding table row overhead for each of > those chunks. Using the bytea type and the toast backend [3] it seems > to come down to the same: data is actually stored in 2kb chunks for a > page size of 8kb. > > I'm assuming I'll be able to get ~8kb jumbo frame packets over a > gigabit network connection and would be able to use a ext4 volume with > a block size of either 32kb or 64kb for the volume housing the binary > data, but a smaller block size for the one housing the relational > data. > > Given those other constraints, it seems silly to split data up into > 2kb chunks on the database level. Is there any way the chunk size for > binary data can be increased here independent of the one for "normal > relational" data? I would not even consider tweaking the internal block sizes until you've determined there is a problem you expect you might solve by doing so. The single most important factor affecting blob performance in postgres is how you send and receive the data -- you absolutely want to use the binary protocol mode (especially for postgres versions that don't support hex mode). The next thing to look at is using bytea/large object -- large objects are a bit faster and have a higher theoretical limit on size but byea is a standard type and this offers a lot of conveniences -- I'd say stick with bytea unless you've determined there is a reason not to. That said, if you are not writing C some client side drivers might only allow binary transmission through the lo interface so that's something to think about. merlin merlin -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general