On Mon, Sep 24, 2018 at 03:28:15PM -0700, Vladimir Ryabtsev wrote: > > it is not unusual to have 1GB cache or more... and do not forget to drop > the cache between tests + do a sync > I also reviewed import scripts and found the import was done in DESCENDING > order of IDs. This seems significant..it means the heap was probably written in backwards order relative to the IDs, and the OS readahead is ineffective when index scanning across a range of IDs. From memory, linux since (at least) 2.6.32 can optimize this. You mentioned you're using 4.4. Does your LVM have readahead ramped up ? Try lvchange -r 65536 data/postgres (or similar). Also..these might be an impractical solution for several reasons, but did you try either 1) forcing a bitmap scan (of only one index), to force the heap reads to be ordered, if not sequential? SET enable_indexscan=off (and maybe SET enable_seqscan=off and others as needed). Or, 2) Using a brin index (scanning of which always results in bitmap heap scan). > > - how big is the table? > pg_table_size('articles') = 427 GB > pg_table_size('pg_toast.pg_toast_221558') = 359 GB Ouch .. if it were me, I would definitely want to make that a partitioned table.. Or perhaps two unioned together with a view? One each for the sparse and dense range? > > You can make use of pg_buffercache in order to see what is actually > cached. > It seems that there is no such a view in my DB, could it be that the module > is not installed? Right, it's in the postgresql -contrib package. And you have to "CREATE EXTENSION pg_buffercache". > > - As Laurenz suggested (VACUUM FULL), you might want to move data around. > You can try also a dump + restore to narrow the problem to data or disk > I launched VACUUM FULL, but it ran very slowly, according to my calculation > it might take 17 hours. I will try to do copy data into another table with > the same structure or spin up another server, and let you know. I *suspect* VACUUM FULL won't help, since (AIUI) it copies all "visible" tuples from the source table into a new table (and updates indices as necessary). It can resolve bloat due to historic DELETEs, but since I think your table was written in reverse order of pkey, I think it'll also copy it in reverse order. CLUSTER will fix that. You can use pg_repack to do so online...but it's going to be painful for a table+toast 1TiB in size: it'll take all day, and also require an additional 1TB while running (same as VAC FULL). Justin