Re: Tweaking bytea / large object block sizes?

Hanno Schlichting <hanno@xxxxxxxxxxx> · Tue, 14 Jun 2011 13:48:28 +0200

On Mon, Jun 13, 2011 at 3:27 AM, Merlin Moncure <mmoncure@xxxxxxxxx> wrote:
> I would not even consider tweaking the internal block sizes until
> you've determined there is a problem you expect you might solve by
> doing so.

It's not a problem as such, but managing data chunks of 2000 bytes +
the hundreds of rows per object in the large_object table for 10mb
objects seems like a lot of wasted overhead, especially if the
underlying filesystem manages 32kb or 64kb blocks. My impression of
those values was that they are a bit antiquated or are tuned for
storing small variable character objects, but not anything I'd call
"binary large objects" these days.

> The single most important factor affecting blob performance
> in postgres is how you send and receive the data -- you absolutely
> want to use the binary protocol mode (especially for postgres versions
> that don't support hex mode). ÂThe next thing to look at is using
> bytea/large object -- large objects are a bit faster and have a higher
> theoretical limit on size but byea is a standard type and this offers
> a lot of conveniences -- I'd say stick with bytea unless you've
> determined there is a reason not to. ÂThat said, if you are not
> writing C some client side drivers might only allow binary
> transmission through the lo interface so that's something to think
> about.

Thanks, I got as much from the docs and the blogosphere.

We are going to use the large object interface. That seems to be the
least we can do - especially to avoid some encoding overhead. We are
storing bytes after all and not ascii characters so there should be no
encoding at all. We aren't using SQL as the query interface as such
but the Python bindings (http://www.initd.org/psycopg/) so we can take
full advantage of the underlying large object API's and do pretty
direct lo_import / lo_export calls. We are targeting at least Postgres
9.0, potentially going for 9.1 soon after it hits final.

Once we get further in the project, we'll of course do some intensive
benchmarking for the various options with our specific data and
configuration. I'm just trying to understand what bits and bytes are
actually stored and transferred behind all those API's.

Hanno

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general