Re: SELECT * FROM huge_table LIMIT 10; Why does it take more than 10 min to complete, with cold caches

"David G. Johnston" <david.g.johnston@xxxxxxxxx> · Tue, 31 Jan 2023 08:43:07 -0700

On Tue, Jan 31, 2023 at 8:28 AM Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
Dimitrios Apostolou <jimis@xxxxxxx> writes:

> The question is why this simple query is taking so long to complete.

Do you get the same 10 rows when you repeat the command?

On the basis of the limited info you provided, I'm going to guess that

(1) there are huge subranges of the table containing no live rows,

so that a seqscan might have to pass over many blocks before it finds

some to return;

It feels like there is room for improvement here using table statistics and the visibility map to significantly reduce the number of pages retrieved that turn out to be all dead.  A limit without an order is not a query to optimize for of course...

(2) once you do reach an area having live rows, the next SELECT picks

up scanning in that same area due to the effects of

"synchronize_seqscans", so you get immediate answers until you reach

the next desert of dead tuples.

Interesting, I was assuming that the behavior on repeated calls was more due to intelligent buffer pool usage.  When doing a sequential scan (I suppose the key word being sequential...) the system decides which pages to fetch before checking to see if they are already in shared buffers (as opposed to returning the ones in shared buffers first then fetching more if needed)?

David J.