Search Postgresql Archives

Why using a partial index is doing slightly more logical I/O than a normal index

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Table definition:

workflow_db=> \d workflow_execution_test

                            Table "public.workflow_execution_test"

              Column              |           Type           | Collation | Nullable | Default

----------------------------------+--------------------------+-----------+----------+---------

execution_id                     | bigint                   |           |          |

state_machine_id                 | bigint                   |           |          |

workflow_id                      | bigint                   |           |          |

started_datetime                 | timestamp with time zone |           |          |

completed_datetime               | timestamp with time zone |           |          |

status                           | character varying(50)    |           |          |

execution_context_s3_arn         | character varying(200)   |           |          |

ol_version                       | integer                  |           |          |

created_datetime                 | timestamp with time zone |           |          |

updated_datetime                 | timestamp with time zone |           |          |

deleted_millis                   | bigint                   |           |          |

acquisition_channel_id           | bigint                   |           |          |

correlation_id                   | character varying(36)    |           |          |

result                           | character varying(50)    |           |          |

state_machine_execution_arn      | character varying(200)   |           |          |

created_by_id                    | bigint                   |           |          |

updated_by_id                    | bigint                   |           |          |

acquired_gcs_s3_object           | text                     |           |          |

sqs_trigger_id                   | bigint                   |           |          |

trigger_message                  | text                     |           |          |

acquired_gcs_s3_object_uuid      | character varying(36)    |           |          |

api_trigger_id                   | bigint                   |           |          |

scheduled_trigger_id             | bigint                   |           |          |

notification_trigger_workflow_id | bigint                   |           |          |

acquired_object_name             | text                     |           |          |

subscription_guid                | uuid                     |           |          |

processing_class_code            | character varying(50)    |           |          |

root_execution_guid              | uuid                     |           |          |

Indexes:

    "test_workflow_execution_active_pending_ordered_by_created_date_" btree (workflow_id, status, deleted_millis, created_datetime) WHERE (status::text = ANY (ARRAY['ACTION_NEEDED'::character varying, 'NOT_STARTED'::character varying, 'PAUSED'::character varying, 'PENDING'::character varying, 'RUNNING'::character varying]::text[])) AND deleted_millis <= 0

    "test_workflow_execution_initial_ui_tabs" btree (workflow_id, status, result, completed_datetime DESC NULLS LAST)

 

I created/populated this table with 22 million rows.  Afterwards I then created the 2 indexes.  So those indexes are packed tightly.

 

As is the optimizer decides to use the partial index.  Below shows it did 33 logical reads.  This index should be very good for this query-  the leading 3 columns of the index are on the 3 criteria in the WHERE clause and the partial part is only storing rows that match the status and deleted_millis filters.

 

explain (analyze, buffers)

select * from workflow_execution_test

where workflow_id = 1070

AND status in ('NOT_STARTED','PAUSED','PENDING','RUNNING')

and deleted_millis <= 0;

                                                                                             QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Index Scan using test_workflow_execution_active_pending_ordered_by_created_date_ on workflow_execution_test  (cost=0.43..15824.82 rows=4335 width=1309) (actual time=0.040..0.095 rows=56 loops=1)

   Index Cond: ((workflow_id = 1070) AND ((status)::text = ANY ('{NOT_STARTED,PAUSED,PENDING,RUNNING}'::text[])))

  Buffers: shared hit=33

 Planning:

   Buffers: shared hit=2

Planning Time: 0.321 ms

Execution Time: 0.117 ms

 

If I hint the query to use the other index it does less work-  it does 24 logical reads vs 33 using the partial index.

 

/*+ IndexScan(workflow_execution_test test_workflow_execution_initial_ui_tabs) */

explain (analyze, buffers)

select * from workflow_execution_test

where workflow_id = 1070

AND status in ('NOT_STARTED','PAUSED','PENDING','RUNNING')

and deleted_millis <= 0;

                                                                                 QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Index Scan using test_workflow_execution_initial_ui_tabs on workflow_execution_test  (cost=0.56..15820.19 rows=4335 width=1309) (actual time=0.049..0.106 rows=56 loops=1)

   Index Cond: ((workflow_id = 1070) AND ((status)::text = ANY ('{NOT_STARTED,PAUSED,PENDING,RUNNING}'::text[])))

   Filter: (deleted_millis <= 0)

   Buffers: shared hit=24 

Planning:

   Buffers: shared hit=2

Planning Time: 0.373 ms

Execution Time: 0.129 ms

 

It may make sense to me if the query did the same amount of work using either of the 2 indexes but it does not make sense to me why the partial index is doing more work.  Could anyone explain this behavior? 

 

One other thing to note-  when using the partial index the cost is .43 .. 15824.82.  When using the other index the cost is .56 .. 15820.19.  So the lower end cost (I believe the cost to find the first row) is slightly lower for the partial index but the higher end cost (I believe to find the last row) is higher for the partial index.  Since there is no LIMIT clause why wouldn’t the optimizer use the lowest cost to find all rows (which in this case would be to use the non-partial index)?

 

Thanks

This e-mail is for the sole use of the intended recipient and contains information that may be privileged and/or confidential. If you are not an intended recipient, please notify the sender by return e-mail and delete this e-mail and any attachments. Certain required legal entity disclosures can be accessed on our website: https://www.thomsonreuters.com/en/resources/disclosures.html

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux