Query is slower with a large proportion of NULLs in several columns

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



First of all, here is the version of PostgreSQL I'm using:
PostgreSQL 13.3 on x86_64-pc-linux-gnu, compiled by x86_64-pc-linux-gnu-gcc (GCC) 7.4.0, 64-bit

I'm new to PostgreSQL, and I'm deciding if I should make columns in my database nullable or not.

I have no need to distinguish between blank/zero and null. But I have noticed that using NULL for unused values does save disk space, as opposed to using zero/blank default values.

In my table I have 142 columns and 18,508,470 rows. Using NULLs instead of zero/blank reduces the table storage space from 11.462 GB down to 9.120 GB. That's a 20% reduction in size, and from what I know about the data it's about right that 20% of the values in the database are unused.

I would think that any full table scans would run faster against the table that has the null values since there are less data pages to read. But, actually, the opposite is true, and by quite a large margin (about 50% slower!).

In the "Slow Query Questions" guide, it says to mention if the table
  • has a large proportion of NULLs in several columns
Yes, it does, I would estimate that about 20% of the time a column's value is null. Why does this matter? Is this a known thing about PostgreSQL performance? If so, where do I read about it?

The table does not contain large objects, has been freshly loaded (so not a lot of UPDATE/DELETEs), is not growing, only has the 1 primary index, and does not use triggers.

Anyway, below are the query results. The field being selected (creation_user) is not in any index, which forces a full table scan:

--> 18.844 sec to execute when all columns defined NOT NULL WITH DEFAULT, table size is 11.462 GB
select creation_user, count(*)
   from eu.royalty_no_null
 group by creation_user;

creation_user|count   |
-------------+--------+
[BLANK]      |   84546|
BACOND       |      10|
BALUN        |    2787|
FOGGOL       |     109|
TRBATCH      |18421018|
QUERY PLAN
Finalize GroupAggregate  (cost=1515478.96..1515479.72 rows=3 width=15) (actual time=11133.324..11135.311 rows=5 loops=1)
  Group Key: creation_user
  I/O Timings: read=1884365.335
  ->  Gather Merge  (cost=1515478.96..1515479.66 rows=6 width=15) (actual time=11133.315..11135.300 rows=13 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        I/O Timings: read=1884365.335
        ->  Sort  (cost=1514478.94..1514478.95 rows=3 width=15) (actual time=11127.396..11127.398 rows=4 loops=3)
              Sort Key: creation_user
              Sort Method: quicksort  Memory: 25kB
              I/O Timings: read=1884365.335
              Worker 0:  Sort Method: quicksort  Memory: 25kB
              Worker 1:  Sort Method: quicksort  Memory: 25kB
              ->  Partial HashAggregate  (cost=1514478.89..1514478.92 rows=3 width=15) (actual time=11127.370..11127.372 rows=4 loops=3)
                    Group Key: creation_user
                    Batches: 1  Memory Usage: 24kB
                    I/O Timings: read=1884365.335
                    Worker 0:  Batches: 1  Memory Usage: 40kB
                    Worker 1:  Batches: 1  Memory Usage: 40kB
                    ->  Parallel Seq Scan on royalty_no_null  (cost=0.00..1475918.59 rows=7712059 width=7) (actual time=0.006..9339.296 rows=6169490 loops=3)
                          I/O Timings: read=1884365.335
Settings: effective_cache_size = '21553496kB', maintenance_io_concurrency = '1', search_path = 'public, public, "$user"'
Planning Time: 0.098 ms
Execution Time: 11135.368 ms


--> 30.57 sec to execute when all columns are nullable instead of defaulting to zero/blank, table size is 9.120 GB:
select creation_user, count(*)
   from eu.royalty_with_null
 group by creation_user;

creation_user|count   |
-------------+--------+
BACOND       |      10|
BALUN        |    2787|
FOGGOL       |     109|
TRBATCH      |18421018|
[NULL]       |   84546|
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------+
Finalize GroupAggregate  (cost=1229649.93..1229650.44 rows=2 width=15) (actual time=25404.925..25407.262 rows=5 loops=1)
  Group Key: creation_user
  I/O Timings: read=17141420.771
  ->  Gather Merge  (cost=1229649.93..1229650.40 rows=4 width=15) (actual time=25404.917..25407.249 rows=12 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        I/O Timings: read=17141420.771
        ->  Sort  (cost=1228649.91..1228649.91 rows=2 width=15) (actual time=25398.004..25398.006 rows=4 loops=3)
              Sort Key: creation_user
              Sort Method: quicksort  Memory: 25kB
              I/O Timings: read=17141420.771
              Worker 0:  Sort Method: quicksort  Memory: 25kB
              Worker 1:  Sort Method: quicksort  Memory: 25kB
              ->  Partial HashAggregate  (cost=1228649.88..1228649.90 rows=2 width=15) (actual time=25397.918..25397.920 rows=4 loops=3)
                    Group Key: creation_user
                    Batches: 1  Memory Usage: 24kB
                    I/O Timings: read=17141420.771
                    Worker 0:  Batches: 1  Memory Usage: 40kB
                    Worker 1:  Batches: 1  Memory Usage: 40kB
                    ->  Parallel Seq Scan on royalty_with_null  (cost=0.00..1190094.92 rows=7710992 width=7) (actual time=1.063..21481.517 rows=6169490 loops=3)
                          I/O Timings: read=17141420.771
Settings: effective_cache_size = '21553496kB', maintenance_io_concurrency = '1', search_path = 'public, public, "$user"'
Planning Time: 0.112 ms
Execution Time: 25407.318 ms

The query runs about 50% longer even though I would think there are 20% less disk pages to read!

In this particular column creation_user very few values are unused, but in other columns in the table many more of the rows have an unused value (blank/zero or NULL).

It seems to make no difference if the column selected is near the beginning of the row or the end, results are about the same.

What is it about null values in the table that slows down the full table scan?

If I populate blank/zero for all of the unused values in columns that are NULLable, the query is fast again. So just defining the columns as NULLable isn't what slows it down -- it's actually the NULL values in the rows that seems to degrade performance.

Do other operations (besides full table scan) get slowed down by null values as well?

Here is the table definition with no nulls, the other table is the same except that all columns are NULLable.
CREATE TABLE eu.royalty_no_null (
isn int4 NOT NULL,
contract_key numeric(19) NOT NULL DEFAULT 0,
co_code numeric(7) NOT NULL DEFAULT 0,
rec_status numeric(1) NOT NULL DEFAULT 0,
record_type numeric(1) NOT NULL DEFAULT 0,
contract_no numeric(6) NOT NULL DEFAULT 0,
subcon_no numeric(4) NOT NULL DEFAULT 0,
sub_division varchar(6) NOT NULL DEFAULT '',
top_price_perc numeric(3) NOT NULL DEFAULT 0,
bar_code_ind varchar(1) NOT NULL DEFAULT '',
process_step_no varchar(1) NOT NULL DEFAULT '',
main_code_group varchar(1) NOT NULL DEFAULT '',
condition varchar(1) NOT NULL DEFAULT '',
neg_sales_ind varchar(1) NOT NULL DEFAULT '',
con_type_ind varchar(1) NOT NULL DEFAULT '',
exchg_tape_ind varchar(1) NOT NULL DEFAULT '',
bagatelle_ind varchar(1) NOT NULL DEFAULT '',
restrict_terr_ind varchar(1) NOT NULL DEFAULT '',
sys_esc_ind varchar(1) NOT NULL DEFAULT '',
equiv_prc_ind varchar(1) NOT NULL DEFAULT '',
scal_fact_ind varchar(1) NOT NULL DEFAULT '',
sell_off_ind varchar(1) NOT NULL DEFAULT '',
cut_rate_ind varchar(1) NOT NULL DEFAULT '',
gross_nett_ind varchar(1) NOT NULL DEFAULT '',
sleeve_ind varchar(1) NOT NULL DEFAULT '',
rate_ind varchar(1) NOT NULL DEFAULT '',
price_basis_calc varchar(1) NOT NULL DEFAULT '',
source_price_basis varchar(1) NOT NULL DEFAULT '',
esc_ind varchar(1) NOT NULL DEFAULT '',
payment_period varchar(1) NOT NULL DEFAULT '',
subcon_reserve_ind varchar(1) NOT NULL DEFAULT '',
release_reason_code varchar(1) NOT NULL DEFAULT '',
bagatelle_qty_val_ind varchar(1) NOT NULL DEFAULT '',
payable_ind varchar(1) NOT NULL DEFAULT '',
record_sequence numeric(1) NOT NULL DEFAULT 0,
sales_type numeric(1) NOT NULL DEFAULT 0,
rate_index numeric(1) NOT NULL DEFAULT 0,
participation numeric(7, 4) NOT NULL DEFAULT 0,
contract_co numeric(7) NOT NULL DEFAULT 0,
reporting_co numeric(7) NOT NULL DEFAULT 0,
article_no varchar(13) NOT NULL DEFAULT '',
art_cat_adm numeric(7) NOT NULL DEFAULT 0,
equiv_config varchar(2) NOT NULL DEFAULT '',
music_class numeric(2) NOT NULL DEFAULT 0,
sales_reference_no numeric(7) NOT NULL DEFAULT 0,
sales_processing_no numeric(2) NOT NULL DEFAULT 0,
sales_trans_code numeric(4) NOT NULL DEFAULT 0,
terr_combination numeric(6) NOT NULL DEFAULT 0,
rate_no numeric(7) NOT NULL DEFAULT 0,
rate varchar(20) NOT NULL DEFAULT '',
rate_normal numeric(7, 3) NOT NULL DEFAULT 0,
rate_esc_1 numeric(7, 3) NOT NULL DEFAULT 0,
sales_process_qty numeric(9) NOT NULL DEFAULT 0,
roy_price numeric(11, 4) NOT NULL DEFAULT 0,
royalty_fee numeric(11, 2) NOT NULL DEFAULT 0,
unit_fee numeric(9, 4) NOT NULL DEFAULT 0,
article_release_date numeric(6) NOT NULL DEFAULT 0,
royalty_rate numeric(7, 3) NOT NULL DEFAULT 0,
orig_article_no varchar(13) NOT NULL DEFAULT '',
no_of_records_in_set numeric(3) NOT NULL DEFAULT 0,
sales_end_period_date numeric(7) NOT NULL DEFAULT 0,
sales_settlement_period numeric(5) NOT NULL DEFAULT 0,
sales_processing_date numeric(7) NOT NULL DEFAULT 0,
sales_date numeric(7) NOT NULL DEFAULT 0,
sales_record_no numeric(9) NOT NULL DEFAULT 0,
equiv_prc_perc numeric(5, 2) NOT NULL DEFAULT 0,
scal_fact_perc numeric(5, 2) NOT NULL DEFAULT 0,
sell_off_perc numeric(5, 2) NOT NULL DEFAULT 0,
price_adj numeric(5, 2) NOT NULL DEFAULT 0,
sleeve_allowance numeric(5, 2) NOT NULL DEFAULT 0,
sales_percentage numeric(5, 2) NOT NULL DEFAULT 0,
price_basis_perc numeric(5, 2) NOT NULL DEFAULT 0,
qtr_date_yyyyq numeric(5) NOT NULL DEFAULT 0,
orig_subcon_no numeric(4) NOT NULL DEFAULT 0,
orig_sales_channel varchar(2) NOT NULL DEFAULT '',
orig_sales_terr numeric(3) NOT NULL DEFAULT 0,
subcon_reserve_code numeric(4) NOT NULL DEFAULT 0,
gross_sales_qty numeric(9) NOT NULL DEFAULT 0,
net_sales_qty numeric(9) NOT NULL DEFAULT 0,
sales_reserved_qty numeric(9) NOT NULL DEFAULT 0,
sales_reserved_perc numeric(5, 2) NOT NULL DEFAULT 0,
released_qty numeric(9) NOT NULL DEFAULT 0,
source_tax_perc numeric(5, 2) NOT NULL DEFAULT 0,
source_tax_amount numeric(11, 2) NOT NULL DEFAULT 0,
income_amount numeric(11, 2) NOT NULL DEFAULT 0,
gross_income numeric(11, 2) NOT NULL DEFAULT 0,
exchange_rate numeric(13, 7) NOT NULL DEFAULT 0,
sales_origin_idc varchar(1) NOT NULL DEFAULT '',
black_box varchar(30) NOT NULL DEFAULT '',
contract_expiry_date numeric(6) NOT NULL DEFAULT 0,
contract_expiry_period numeric(3) NOT NULL DEFAULT 0,
bagatelle_amount numeric(9) NOT NULL DEFAULT 0,
to_be_released_date numeric(6) NOT NULL DEFAULT 0,
sales_start_period_date numeric(7) NOT NULL DEFAULT 0,
selling_company numeric(7) NOT NULL DEFAULT 0,
royalty_amount numeric(11, 2) NOT NULL DEFAULT 0,
currency_code numeric(3) NOT NULL DEFAULT 0,
roy_price_curr numeric(11, 4) NOT NULL DEFAULT 0,
royalty_fee_curr numeric(11, 2) NOT NULL DEFAULT 0,
unit_fee_curr numeric(9, 4) NOT NULL DEFAULT 0,
source_tax_amount_curr numeric(11, 2) NOT NULL DEFAULT 0,
royalty_amount_curr numeric(11, 2) NOT NULL DEFAULT 0,
change_description varchar(40) NOT NULL DEFAULT '',
error_no numeric(2) NOT NULL DEFAULT 0,
first_rel_ind varchar(1) NOT NULL DEFAULT '',
max50_percentage numeric(5, 2) NOT NULL DEFAULT 0,
max50_compare_ind varchar(1) NOT NULL DEFAULT '',
ppd_fin_curr numeric(11, 4) NOT NULL DEFAULT 0,
project_ref_nr varchar(15) NOT NULL DEFAULT '',
reserve_priority_ind varchar(1) NOT NULL DEFAULT '',
esca_seqno numeric(7) NOT NULL DEFAULT 0,
cut_rate_perc numeric(5, 2) NOT NULL DEFAULT 0,
cp_details_c int2 NOT NULL DEFAULT 0,
price_info_c int2 NOT NULL DEFAULT 0,
arts_con_recording_c int2 NOT NULL DEFAULT 0,
processing_company numeric(7) NOT NULL DEFAULT 0,
date_time_change_p numeric(13) NOT NULL DEFAULT 0,
locked_ind_b bytea NOT NULL DEFAULT '\x00',
date_time_cleanup_p numeric(13) NOT NULL DEFAULT 0,
min_price_perc numeric(5, 2) NOT NULL DEFAULT 0,
reserve_sale_type varchar(1) NOT NULL DEFAULT '',
uplift_perc numeric(5, 2) NOT NULL DEFAULT 0,
double_ind bytea NOT NULL DEFAULT '\x00',
min_unit_fee numeric(9, 4) NOT NULL DEFAULT 0,
sales_batch_type varchar(1) NOT NULL DEFAULT '',
prev_esca_qty numeric(9) NOT NULL DEFAULT 0,
arts_con_recording_2_c int2 NOT NULL DEFAULT 0,
aif_share_percentage numeric(5, 2) NOT NULL DEFAULT 0,
aif_rate_percentage numeric(5, 2) NOT NULL DEFAULT 0,
original_recording_id varchar(12) NOT NULL DEFAULT '',
inter_companied_line bytea NOT NULL DEFAULT '\x00',
last_chg_date_time timestamp(6) NOT NULL DEFAULT '0001-01-01',
last_chg_by_id varchar(8) NOT NULL DEFAULT '',
creation_date timestamp(6) NOT NULL DEFAULT '0001-01-01',
creation_user varchar(8) NOT NULL DEFAULT '',
aif_sublic_ind varchar(1) NOT NULL DEFAULT '',
sublicensee numeric(7) NOT NULL DEFAULT 0,
scal_fact_qty numeric(5) NOT NULL DEFAULT 0,
pay_delay numeric(3) NOT NULL DEFAULT 0,
recalc_date timestamp(6) NOT NULL DEFAULT '0001-01-01',
recalc_userid varchar(8) NOT NULL DEFAULT '',
sent_to_sap_ind varchar(1) NOT NULL DEFAULT '',
CONSTRAINT royalty_no_null_pkey PRIMARY KEY (isn)
);

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux