Aaron Turner <synfinatic@xxxxxxxxx> writes: > Well before I go about re-architecting things, it would be good to > have a strong understanding of just what is going on. Obviously, the > unique index on the char(48) is the killer. What I don't know is: You have another unique index on the integer primary key, so it's not the mere fact of a unique index that's hurting you. > 1) Is this because the column is so long? Possibly. Allowing for 12 bytes index-entry overhead, the char keys would be 60 bytes vs 16 for the integer column, so this index is physically almost 4x larger than the other. You might say "but that should only cause 4x more I/O" but it's not necessarily so. What's hard to tell is whether you are running out of RAM disk cache space, resulting in re-reads of pages that could have stayed in memory when dealing with one-fifth as much index data. You did not show us the iostat numbers for the two cases, but it'd be interesting to look at the proportion of writes to reads on the data drive in both cases. > 2) Is this because PG is not optimized for char(48) (maybe it wants > powers of 2? or doesn't like even numbers... I don't know, just > throwing it out there) Are the key values really all 48 chars long? If not, you made a bad datatype choice: varchar(n) (or even text) would be a lot smarter. char(n) wastes space on blank-padding. Another thing to think about is whether this is C locale or not. String comparisons in non-C locales can be horrendously expensive ... though I'd expect that to cost CPU not I/O. (Hmm ... is it possible your libc is hitting locale config files constantly? Might be worth strace'ing to confirm exactly where the I/O is going.) > 4) Does decoding the data (currently base64) and storing the binary > data improve the distribution of the index, thereby masking it more > efficent? No, but it'd reduce the size of the index, which you certainly want. Storing as bytea would also eliminate any questions about wasteful locale-dependent comparisons. The only one of these effects that looks to me like it could result in worse-than-linear degradation of I/O demand is maxing out the available RAM for disk cache. So while improving the datatype choice would probably be worth your while, you should first see if fooling with shared_buffers helps, and if not it's time to buy RAM not disk. regards, tom lane