On Fri, 2005-11-11 at 09:17 -0500, Tom Lane wrote: > Richard Huxton <dev@xxxxxxxxxxxx> writes: > > Does external_id_map really have 15 million rows? If not, try a VACUUM > > FULL on it. Be prepared to give it some time to complete. > > Please don't, actually, until we understand what's going on. Ack, I was the middle of the vacuum full already when I got this. I still have the strace and lsof output from before the vacuum full. It's definitely reading Postgres files: bash-2.05b$ grep '^read' subsourcestrace | cut -d, -f1 | sort | uniq -c 100453 read(44 48218 read(47 bash-2.05b$ grep 'seek' subsourcestrace | cut -d, -f1 | sort | uniq -c 1 _llseek(40 1 _llseek(43 35421 _llseek(44 1 _llseek(45 1 _llseek(46 39787 _llseek(47 1 _llseek(48 File handles: 44 - external_id_map 47 - external_id_map_primary_key 40 - subject 43 - subject_pkey 45 - external_id_map_source 46 - external_id_map_source_target_id 48 - external_id_map_source_source_id_unique As far as the seek offsets go, R doesn't want to do a histogram for me without using up more RAM than I have. I put up some files at: http://arctur.us/pgsql/ They are: subsourcestrace - the strace output from "select * from subject_source where source='SCH'" subsourcestrace-nocond - the strace output from "select * from subject_source" subsourcelsof - the lsof output (for mapping from file handles to file names) relfilenode.html - for mapping from file names to table/index names (I think I've gotten all the relevant file handle-table name mappings above, though) seekoff-44 - just the beginning seek offsets for the 44 file handle (external_id_map) seekoff-47 - just the beginning seek offsets for the 47 file handle (external_id_map_primary_key) The vacuum full is still going; I'll let you know if it changes things. > The thing is that the given plan will fetch every row indicated by the > index in both cases, in order to check the row's visibility. I don't > see how an additional test on a non-indexed column would cause any > additional I/O. If the value were large enough to be toasted > out-of-line then it could cause toast table accesses ... but we're > speaking of a char(3). Pardon my ignorance, but do the visibility check and the check of the condition happen at different stages of execution? Would it end up checking the condition for all 15M rows, but only checking visibility for the 1200 rows that come back from the join? I guess I'm confused about what "every row indicated by the index" means in the context of the join. Thanks for taking an interest, Mitch ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq