Re: Query is slower with a large proportion of NULLs in several columns

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



ok, here are results after I did:
set max_parallel_workers_per_gather = 0;

no nulls table is 11.462 GB:
QUERY PLAN
HashAggregate  (cost=1676432.13..1676432.16 rows=3 width=15) (actual time=19908.343..19908.345 rows=5 loops=1)
  Group Key: roys_creation_user
  Batches: 1  Memory Usage: 24kB
  I/O Timings: read=532369.898
  ->  Seq Scan on royalty_no_null  (cost=0.00..1583887.42 rows=18508942 width=7) (actual time=0.013..16705.734 rows=18508470 loops=1)
        I/O Timings: read=532369.898
Settings: effective_cache_size = '21553496kB', maintenance_io_concurrency = '1', max_parallel_workers_per_gather = '0', search_path = 'public, public, "$user"'
Planning Time: 0.056 ms
Execution Time: 19908.383 ms

with nulls table is 9.120 GB:
QUERY PLAN
HashAggregate  (cost=1390580.70..1390580.72 rows=2 width=15) (actual time=30369.758..30369.761 rows=5 loops=1)
  Group Key: roys_creation_user
  Batches: 1  Memory Usage: 24kB
  I/O Timings: read=6440851.540
  ->  Seq Scan on royalty_with_null  (cost=0.00..1298048.80 rows=18506380 width=7) (actual time=0.015..25525.104 rows=18508470 loops=1)
        I/O Timings: read=6440851.540
Settings: effective_cache_size = '21553496kB', maintenance_io_concurrency = '1', max_parallel_workers_per_gather = '0', search_path = 'public, public, "$user"'
Planning Time: 0.060 ms
Execution Time: 30369.796 ms

Still taking 10X more I/O to read the smaller table. Very odd.

Regarding the earlier comment from David Johnston: If I put null values in just one of the columns for all rows, it should force a null bitmap to be created for every row, with the same amount of checking of the bitmap required. However, the query still runs faster even though the table is larger:
with nulls table is 11.604 GB when all values are filled except 1 column has mostly nulls. The extra 0.14 GB (11.604 GB - 11.462 GB) is probably space consumed by null bitmaps:
QUERY PLAN
HashAggregate  (cost=1693765.03..1693765.06 rows=3 width=15) (actual time=26452.653..26452.655 rows=5 loops=1)
  Group Key: roys_creation_user
  Batches: 1  Memory Usage: 24kB
  I/O Timings: read=2706123.209
  ->  Seq Scan on royalty_with_null_cols_filled  (cost=0.00..1601218.02 rows=18509402 width=7) (actual time=0.014..22655.366 rows=18508470 loops=1)
        I/O Timings: read=2706123.209
Settings: effective_cache_size = '21553496kB', maintenance_io_concurrency = '1', max_parallel_workers_per_gather = '0', search_path = 'public, public, "$user"'
Planning Time: 0.068 ms
Execution Time: 26452.691 ms

It seems to be the actual presence of null values that slows things down, even when the same sized null bitmap exists for each row.

On Mon, Dec 20, 2021 at 5:51 PM Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
Lars Bergeson <larsavatar@xxxxxxxxx> writes:
> What is it about null values in the table that slows down the full table
> scan?

If a row has any nulls, then it contains a "nulls bitmap" [1] that says
which columns are null, and that bitmap has to be consulted while
walking through the row contents.  So the most obvious theory here
is that that adds overhead that's significant in your case.  But there
are some holes in that theory, mainly that the I/O timings you are
showing don't seem very consistent:

no nulls:
>   I/O Timings: read=1884365.335
> Execution Time: 11135.368 ms

with nulls:
>   I/O Timings: read=17141420.771
> Execution Time: 25407.318 ms

Regardless of CPU time required, it should not take 10X less I/O
time to read a physically larger table.  So there's something
fairly bogus going on there.  One thing you might try is disabling
parallelism (set max_parallel_workers_per_gather = 0) to see if
that's confusing the numbers somehow.

                        regards, tom lane

[1] https://www.postgresql.org/docs/current/storage-page-layout.html#STORAGE-TUPLE-LAYOUT

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux