Re: checking for a NULL date in a partitioned table kills performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 8/22/24 6:07 PM, Rui DeSousa wrote:


On Aug 22, 2024, at 7:32 PM, Sbob <sbob@xxxxxxxxxxxxxxxxxxxxx> wrote:


On 8/22/24 5:26 PM, Sbob wrote:

On 8/22/24 5:06 PM, Rui DeSousa wrote:

On Aug 22, 2024, at 5:44 PM, Sbob <sbob@xxxxxxxxxxxxxxxxxxxxx> wrote:

All;

I am running a select from a partitioned table. The table (and all the partitions) have an index on contract_date like this:
CREATE INDEX on part_tab (contract_date) where contract_date > '2022-01-01'::date

The table (including all partitions) has 32million rows
The db server is an aurora postgresql instance with 128GB of ram and 16 vcpu's

The shared buffers is set to 90GB and effective_cache_size is also 90GB
I set default_statistics_target to 1000 and ram a vacuum analyze on the table

I am selecting a number of columns and specifying this where clause:

WHERE (
                     (contract_date IS NULL)
                     OR
                     (contract_date > '2022-01-01'::date)
              )

This takes 15 seconds to run and an explain says it's doing a table scan on all partitions (the query is not specifying the partition key)
If I change the where clause to look like this:

WHERE (
                   (contract_date > '2022-01-01'::date)
              )

Then it performs index scans on all the partitions and runs in about 600ms

If i leave the where clause off entirely it performs table scans of the partitions and takes approx 18 seconds to run

I am trying to get the performance to less than 2sec,
I have tried adding indexes on the table and all partitions like this:
CREATE INDEX ON table (contract_date NULLS FIRST) ;
but the performance with the full where clause is the same:

WHERE (
                     (contract_date IS NULL)
                     OR
                     (contract_date > '2022-01-01'::date)
              )

runs in 15 seconds and scans all partitions

I also tried indexes i=on the table and all partitions like this:
CREATE INDEX ON table (contract_date) WHERE contract_date IS NULL;

but I get the same result, table scans on all partitions and it runs in 15 seconds

Any help or advice ?

Thanks in advance


What is contract_date and when will it be null?


it's a date data type and it allows NULL's not sure why, this is a client's system


29 million of the 32 million rows in the table have NULL for contract_date


NULLs are not indexed thus the OR predicate invalidate the use of the index.  

Since you are already creating a partial index just include the NULLs.  It index will get used for both of your queries.

create index table_idx1 
  on table (contract_date) 
  where contract_date > ‘1/1/2022’
      or contract_date is null
;


The reason why I asked when is contract_date null is because attributes in a table should be non nullable.  If it’s nullable then that begs the question if it belong in that table in the first place; and sometimes the answer is yes.  I just see a lot of half baked schemas out there.  I refer to them as organically designed schemas. 

-Rui.

I agree, I will find out from the client



[Index of Archives]     [Postgresql Home]     [Postgresql General]     [Postgresql Performance]     [Postgresql PHP]     [Postgresql Jobs]     [PHP Users]     [PHP Databases]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Databases]     [Yosemite Forum]

  Powered by Linux