[ cc'ing to pgsql-performance because of performance issue for hash indexes ] "Balazs Nagy" <bnagy@xxxxxxxxxxxxxx> writes: > Database table size: ~60 million rows > Field to index: varchar 127 > CREATE INDEX ... USING hash ... > fails with a file not found error (psql in verbose mode): > ERROR: 58P01: could not open segment 3 of relation 1663/16439/16509 (target > block 528283): No such file or directory > LOCATION: _mdfd_getseg, md.c:954 Wow, you're trying to build an 8GB hash index? Considering that hash indexes still don't have WAL support, it hardly seems like a good idea to be using one that large. The immediate problem here seems to be that the hash code is trying to touch a page in segment 4 when it hasn't yet touched anything in segment 3. The low-level md.c code assumes (not unreasonably) that this probably represents a bug in the calling code, and errors out instead of allowing the segment to be created. We ought to think about rejiggering the smgr.c interface to support hash's behavior more reasonably. There's already one really bad kluge in mdread() for hash support :-( One thought that comes to mind is to require hash to do an smgrextend() addressing the last block it intends to use whenever it allocates a new batch of blocks, whereupon md.c could adopt a saner API: allow smgrextend but not other calls to address blocks beyond the current EOF. I had once wanted to require hash to explicitly fill all the blocks in sequence, but that's probably too radical compared to what it does now --- especially seeing that it seems the extension has to be done while holding the page-zero lock (see _hash_expandtable). Writing just the logically last block in a batch would have the effect that hash indexes could contain holes (unallocated extents) on filesystems that support that. Normally I would think that probably a Bad Thing, but since hash indexes are never scanned sequentially, it might not matter whether they end up badly fragmented because of after-the-fact filling in of a hole. Thoughts? regards, tom lane