Well, I feel like a fool, because I failed to notice that the total runtime shown in that profile wasn't anywhere close to the actual wall clock time. gprof is indeed simply not counting the time spent in dynamically-linked code. With tsearch2 statically linked into the backend, a more believable picture emerges: % cumulative self self total time seconds seconds calls Ks/call Ks/call name 98.96 1495.93 1495.93 33035195 0.00 0.00 hemdistsign 0.27 1500.01 4.08 10030581 0.00 0.00 makesign 0.11 1501.74 1.73 588976 0.00 0.00 gistchoose 0.10 1503.32 1.58 683471 0.00 0.00 XLogInsert 0.05 1504.15 0.83 246579 0.00 0.00 sizebitvec 0.05 1504.93 0.78 446399 0.00 0.00 gtsvector_union 0.03 1505.45 0.52 3576475 0.00 0.00 LWLockRelease 0.03 1505.92 0.47 1649 0.00 0.00 gtsvector_picksplit 0.03 1506.38 0.47 3572572 0.00 0.00 LWLockAcquire 0.02 1506.74 0.36 444817 0.00 0.00 gtsvector_same 0.02 1507.09 0.35 4077089 0.00 0.00 AllocSetAlloc 0.02 1507.37 0.28 236984 0.00 0.00 gistdoinsert 0.02 1507.63 0.26 874195 0.00 0.00 hash_search 0.02 1507.89 0.26 9762101 0.00 0.00 gtsvector_penalty 0.01 1508.08 0.19 236984 0.00 0.00 gistmakedeal 0.01 1508.27 0.19 841754 0.00 0.00 UnpinBuffer 0.01 1508.45 0.18 22985469 0.00 0.00 hemdistcache 0.01 1508.63 0.18 3998572 0.00 0.00 LockBuffer 0.01 1508.81 0.18 686681 0.00 0.00 gtsvector_compress 0.01 1508.98 0.17 11514275 0.00 0.00 gistdentryinit So we gotta fix hemdistsign ... regards, tom lane