Craig, * Craig Ringer (craig@xxxxxxxxxxxxxxxxxxxxx) wrote: > I've been doing some testing for the Bacula project, which uses > PostgreSQL as one of the databases in which it stores backup catalogs. We also use Bacula with a PostgreSQL backend. > I've been evaluating a schema change for Bacula that takes a field > that's currently stored as a gruesome-to-work-with base64-encoded > representation of a binary blob, and expands it into a set of integer > fields that can be searched, indexed, etc. This would be extremely nice. > The table size of the expanded form is marginally smaller than for the > base64-encoded string version. However, INSERT times are *CONSIDERABLY* > greater for the version with more fields. It takes 1011 seconds to > insert the base64 version, vs 1290 seconds for the expanded-fields > version. That's a difference of 279 seconds, or 27%. > > Despite that, the final table sizes are the same. > > If I use tab-separated input and COPY, the original-format file is > 1300MB and the expanded-structure format is 1618MB. The performance hit > on COPY-based insert is not as bad, at 161s vs 182s (13%), but still > quite significant. > > Any ideas about what I might be able to do to improve the efficiency of > inserting records with many integer fields? Bacula should be using COPY for the batch data loads, so hopefully won't suffer too much from having the fields split out. I think it would be interesting to try doing PQexecPrepared with binary-format data instead of using COPY though. I'd be happy to help you implement a test setup for doing that, if you'd like. Thanks, Stephen
Attachment:
signature.asc
Description: Digital signature