Search Postgresql Archives

Re: Why using a partial index is doing slightly more logical I/O than a normal index

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Because both the actual times and the expected costs are so similar to each other, I am assuming you are asking this as more of an academic question than a practical one.  If it is actually a practical matter, you should find a better example to present to us.

On Wed, May 3, 2023 at 9:17 AM Dirschel, Steve <steve.dirschel@xxxxxxxxxxxxxxxxxx> wrote:

 

 Index Cond: ((workflow_id = 1070) AND ((status)::text = ANY ('{NOT_STARTED,PAUSED,PENDING,RUNNING}'::text[])))

  Buffers: shared hit=33


For this usage, the =ANY is a "boundary condition".  It re-descends the index for each value in the array, and each of those re-descents incurs buffer accesses.  They cost very little, as the planner thinks they will mostly be cached already (and indeed, they are), but the difference still shows up in the Buffers tabulation.

   Index Cond: ((workflow_id = 1070) AND ((status)::text = ANY ('{NOT_STARTED,PAUSED,PENDING,RUNNING}'::text[])))

   Filter: (deleted_millis <= 0)

   Buffers: shared hit=24


For this usage, the =ANY is applied as an "in-index filter".  It only descends the index once, to where workflow_id=1070, and then scans forward applying the =ANY to each index-tuple until it exhausts the =1070 condition.  As long as all the =1070 entries fit into just a few buffers, the count of buffers accessed by doing this is fewer than doing the re-descents.  (Stepping from tuple to tuple in the same index page doesn't count as a new access.  While a re-descent releases and reacquires the buffer)

This is a bit of speculation on my part, as nothing in the plan output distinguishes boundary condition usages from in-index-filter usages.  This speculation is based on the fact that I was recently investigating a similar situation and did extensive work on it with a debugger and by adding new experimental log messages.  Also, It isn't clear to me why it chooses one usage for one plan and the other usage for the other one in your case, as it seems that both would be eligible for the "boundary condition" treatment.  But presumably for some reason invisible to us it just thinks one approach is faster for one index and the other approach for the other index.
 

One other thing to note-  when using the partial index the cost is .43 .. 15824.82.  When using the other index the cost is .56 .. 15820.19.  So the lower end cost (I believe the cost to find the first row) is slightly lower for the partial index but the higher end cost (I believe to find the last row) is higher for the partial index.  Since there is no LIMIT clause why wouldn’t the optimizer use the lowest cost to find all rows (which in this case would be to use the non-partial index)?


There is some code in the planner which deems plans to be tied if their costs are within a small difference (1%, I think it is).  This allows some branches of the tree of all possible plans to be pruned off early, which can save a lot of time in planning.

Cheers,

Jeff

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux