Re: Unexpected result count from update statement on partitioned table

Ron <ronljohnsonjr@xxxxxxxxx> · Fri, 18 Dec 2020 07:23:51 -0600



    Would (task_type in (1,2)) make any logical difference?

    
    On 12/18/20 6:11 AM, Craig McIlwee
      wrote:

    
        Despite looking at this query on and off for a couple of
          days, it wasn't until seeing it in Lauenz's reply that I
          noticed  a logical issue with the query that changes things a
          bit.  There should be parenthesis around the task_type
          predicates, otherwise you end up getting reserved rows in the
          result set.  After looking at it more, I see that when the
          original query settles in to consistently returning the
          expected number of results they happen to be the exact same
          results each time.  Correcting the logical error in the query
          solves that and now consistently gives too many results,
          regardless of the number of executions.

        
          Corrected query (but still not working properly):
          

          update task_parent

            set reserved = true

            from (

              select id

              from task_parent

              where reserved = false

                and (task_type = 1 or task_type = 2)

              order by task_timestamp

              limit 50

              for update skip locked) as sub

            where sub.id
            = task_parent.id

            returning *;
        
        
        Yes,
          this must be a bug:

          
          EXPLAIN (COSTS OFF) update task_parent

            set reserved = true

            from (

              select id

              from task_parent

              where reserved = false

                and task_type = 1 or task_type = 2

              order by task_timestamp

              limit 50

              for update skip locked) as sub

            where sub.id = task_parent.id

          returning task_parent.id;

          
                                                                  QUERY
          PLAN                                                       

--------------------------------------------------------------------------------------------------------------------------

           Update on task_parent

             Update on task_child_1 task_parent_1

             Update on task_child_2 task_parent_2

             ->  Hash Join

                   Hash Cond: (task_parent_1.id = sub.id)

                   ->  Seq Scan on task_child_1 task_parent_1

                   ->  Hash

                         ->  Subquery Scan on sub

                               ->  Limit

                                     ->  LockRows

                                           ->  Merge Append

                                                 Sort Key:
          task_parent_3.task_timestamp

                                                 ->  Index Scan using
          task_child_1_task_timestamp_idx on task_child_1 task_parent_4

                                                       Filter: (((NOT
          reserved) AND (task_type = 1)) OR (task_type = 2))

                                                 ->  Index Scan using
          task_child_2_task_timestamp_idx on task_child_2 task_parent_5

                                                       Filter: (((NOT
          reserved) AND (task_type = 1)) OR (task_type = 2))

             ->  Hash Join

                   Hash Cond: (task_parent_2.id = sub_1.id)

                   ->  Seq Scan on task_child_2 task_parent_2

                   ->  Hash

                         ->  Subquery Scan on sub_1

                               ->  Limit

                                     ->  LockRows

                                           ->  Merge Append

                                                 Sort Key:
          task_parent_6.task_timestamp

                                                 ->  Index Scan using
          task_child_1_task_timestamp_idx on task_child_1 task_parent_7

                                                       Filter: (((NOT
          reserved) AND (task_type = 1)) OR (task_type = 2))

                                                 ->  Index Scan using
          task_child_2_task_timestamp_idx on task_child_2 task_parent_8

                                                       Filter: (((NOT
          reserved) AND (task_type = 1)) OR (task_type = 2))

          (29 rows)

          
          The subquery is executed twice, and the two executions
          obviously don't

          return the same results.  I am at a loss for an explanation
          ... 
        

        I did notice that the subquery is executed twice and this
          is the difference between the subquery and CTE since the CTE
          is only executed once and then the CTE result is used in the
          joins against the child tables.  Beyond that, I was unable to
          rationalize what was going on.  One thing I don't know is if
          the subquery should be executed more than once.  If yes, then
          like you said, I would think that they should give the same
          results.
        

        Another interesting thing I noticed is that multiple
          executions give tasks with overlapping time ranges.  Given 2
          executions, the task_time ordering should give the oldest 50
          first and then the next oldest 50 after that.  In reality, I
          see that the second execution has tasks that are older than
          the newest timestamp of the first execution.  Wrapping the
          update statement in a CTE and then picking out the min/max
          shows this:
        

        with updated as (

            update task_parent

            set reserved = true

            from (

              select id, task_type

              from task_parent

              where reserved = false

                and (task_type = 1 or task_type = 2)

              order by task_timestamp

              limit 50

              for update skip locked) as sub

            where sub.id
          = task_parent.id

              and sub.task_type = task_parent.task_type

            returning *

          )

          select min(task_timestamp), max(task_timestamp)

          from updated;

        
        Execution 1:
                    min             |            max

          ----------------------------+----------------------------

           2020-12-17 11:44:51.192119 | 2020-12-17 11:45:03.881409

          
          Execution 2:

                      min             |            max

          ----------------------------+----------------------------

           2020-12-17 11:44:59.943108 | 2020-12-17 11:45:14.273185

        
        min of execution 2 is older than max of execution 1 which
          should not happen.
        

        Craig
      
      
        On Fri, Dec 18, 2020 at 4:06
          AM Laurenz Albe <laurenz.albe@xxxxxxxxxxx>
          wrote:

        
        On
          Thu, 2020-12-17 at 12:21 -0500, Craig McIlwee wrote:

          > Our application uses a queue-like table to assign tasks
          to users and this has worked well for us for a few years.  Now
          we are in the process of adding some restrictions to which
          tasks a user can

          > work on and that is based on an attribute of each task
          that does not change for the task's lifespan.  Users may have
          access to work on one or more or types of tasks.  To improve
          query time when

          > finding the set of tasks that we assign, we are
          introducing partitioning into our task queue table.  When
          assigning tasks, we issue an update statement to mark the
          tasks as reserved using a subquery

          > that orders the tasks by age.  With the introduction of
          partitioning, we are seeing that the update statement affects
          more rows than expected.  An example query is:

          > 

          > ---

          > update task_parent

          > set reserved = true

          > from (

          >   select id

          >   from task_parent

          >   where reserved = false

          >     and task_type = 1 or task_type = 2

          >   order by task_timestamp

          >   limit 50

          >   for update skip locked) as sub

          > where sub.id = task_parent.id

          > returning task_parent.id

          > ---  

          > 

          > In the statement above, we have a subquery to limit the
          number of tasks to 50 yet the update statement sometimes
          returns more than 50 records.  I have narrowed this down to a
          small, reproducible

          > example shown below.  The first time I run the update
          statement I get ~65 records, then typically ~53 the next few
          runs, and then it starts consistently giving me 50 records
          after that.  Then if I

          > bump the limit to 100 I will get more than 100 initially
          and after several executions it starts to settle into always
          giving the expected 100.

          > 

          > Below is the full setup that can be used to reproduce
          what I'm seeing.  It was initially observed on PostgreSQL 11.8
          but I can also reproduce it on 13.0.

          > 

          > ---

          > create table task_parent (

          >   id bigint not null,

          >   task_type smallint not null,

          >   reserved boolean not null,

          >   task_timestamp timestamp not null

          > ) partition by list (task_type);

          > 

          > create table task_child_1

          > partition of task_parent for values in (1);

          > 

          > create table task_child_2

          > partition of task_parent for values in (2);

          > 

          > insert into task_parent

          > select

          >   generate_series(1, 500000),

          >   case when random() < 0.5 then 1 else 2 end,

          >   false,

          >   now() - (random() * '1 day'::interval);

          >   

          > create index task_parent_task_time_idx

          > on task_parent (task_timestamp);

          > 

          > update task_parent

          > set reserved = true

          > from (

          >   select id

          >   from task_parent

          >   where reserved = false

          >     and task_type = 1 or task_type = 2

          >   order by task_timestamp

          >   limit 50

          >   for update skip locked) as sub

          > where sub.id = task_parent.id

          > returning task_parent.id;

          > ---

          > 

          > A couple of interesting observations:

          > 1) If I remove the order by clause I always get the
          expected number of results

          > 2) If I rewrite the query to use a CTE for the task IDs
          instead of a subquery then I always get the expected number of
          results

          > 

          > At its surface, this seems like it could be a bug but
          maybe there is something about this usage pattern that is
          known/expected to cause this behavior.  So that's the question
          - is this a bug that

          > should be reported to pgsql-bugs, or is this expected and
          if so, why?

          
          Yes, this must be a bug:

          
          EXPLAIN (COSTS OFF) update task_parent

          set reserved = true

          from (

            select id

            from task_parent

            where reserved = false

              and task_type = 1 or task_type = 2

            order by task_timestamp

            limit 50

            for update skip locked) as sub

          where sub.id = task_parent.id

          returning task_parent.id;

          
                                                                  QUERY
          PLAN                                                        

--------------------------------------------------------------------------------------------------------------------------

           Update on task_parent

             Update on task_child_1 task_parent_1

             Update on task_child_2 task_parent_2

             ->  Hash Join

                   Hash Cond: (task_parent_1.id
          = sub.id)

                   ->  Seq Scan on task_child_1 task_parent_1

                   ->  Hash

                         ->  Subquery Scan on sub

                               ->  Limit

                                     ->  LockRows

                                           ->  Merge Append

                                                 Sort Key:
          task_parent_3.task_timestamp

                                                 ->  Index Scan using
          task_child_1_task_timestamp_idx on task_child_1 task_parent_4

                                                       Filter: (((NOT
          reserved) AND (task_type = 1)) OR (task_type = 2))

                                                 ->  Index Scan using
          task_child_2_task_timestamp_idx on task_child_2 task_parent_5

                                                       Filter: (((NOT
          reserved) AND (task_type = 1)) OR (task_type = 2))

             ->  Hash Join

                   Hash Cond: (task_parent_2.id
          = sub_1.id)

                   ->  Seq Scan on task_child_2 task_parent_2

                   ->  Hash

                         ->  Subquery Scan on sub_1

                               ->  Limit

                                     ->  LockRows

                                           ->  Merge Append

                                                 Sort Key:
          task_parent_6.task_timestamp

                                                 ->  Index Scan using
          task_child_1_task_timestamp_idx on task_child_1 task_parent_7

                                                       Filter: (((NOT
          reserved) AND (task_type = 1)) OR (task_type = 2))

                                                 ->  Index Scan using
          task_child_2_task_timestamp_idx on task_child_2 task_parent_8

                                                       Filter: (((NOT
          reserved) AND (task_type = 1)) OR (task_type = 2))

          (29 rows)

          
          The subquery is executed twice, and the two executions
          obviously don't

          return the same results.  I am at a loss for an explanation
          ...

          
          Yours,

          Laurenz Albe

          -- 

          Cybertec | https://www.cybertec-postgresql.com

          
    -- 

      Angular momentum makes the world go 'round.