Search Postgresql Archives

Re: allow LIMIT in UPDATE and DELETE

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> I asked that question a while ago..
> 
> http://archives.postgresql.org/pgsql-general/2005-04/msg00386.php
> 
> and got this response:
> 
> http://archives.postgresql.org/pgsql-general/2005-04/msg00387.php
> 
> Works quite well :)

I don't doubt that it works well, in fact that's what I plan to use
myself... I was not asking for a workaround, that I can figure out
myself :-)

Regarding the arguments of the post you linked:

"It isn't in the SQL standard" - neither LIMIT on SELECT, still it is a
very useful feature;

"it would have undefined behavior" - so does LIMIT on SELECT without
order by, and it is again still useful when you don't care about which
entry you get, you just want one of them. It certainly provides a
planning advantage in some cases where ordering would mean a sort;

The subquery delete will definitely have more involved plan than a
delete with limit. On some of my tables that would make a difference,
even if not that big one due to caching effects (the same rows are
deleted/updated which were already visited by the subquery). I can't say
for sure how big is the penalty of doing the subqery, but the plans I
have seen involve something like:

db=# prepare test_001(bigint, bigint, smallint) as
db-# DELETE FROM big_table
db-# WHERE (col1, col2) IN
db-#     (SELECT col1, col2 FROM big_table
db(#      WHERE col1=$2
db(#        AND col3 IS NOT NULL
db(#        AND col4 =$3
db(#        AND col5 <> 'o'
db(#        LIMIT 1000);
PREPARE
db=# explain execute test_001(1,1,1);
                                                       QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=2166.10..2282.99 rows=1 width=6)
   ->  HashAggregate  (cost=2166.10..2166.10 rows=29 width=16)
         ->  Subquery Scan "IN_subquery"  (cost=0.00..2165.95 rows=29
width=16)
               ->  Limit  (cost=0.00..2165.66 rows=29 width=16)
                     ->  Index Scan using idx_big_table_col3 on
big_table  (cost=0.00..2165.66 rows=29 width=16)
                           Index Cond: (col1 = $2)
                           Filter: ((col3 IS NOT NULL) AND (col4 = $3)
AND ("col5" <> 'o'::bpchar))
   ->  Index Scan using pk_big_table on big_table  (cost=0.00..4.02
rows=1 width=22)
         Index Cond: ((big_table.col1 = "outer".col1) AND
(big_table.col2 = "outer".col2))
(9 rows)
  
idx_big_table_col3 - is a partial index where col3 is not null, which
means ~ 10% of the total rows. The estimates are a generic estimate, the
worst case is that there are a few 100K rows selected by the subselect
before the limit. So I guess the worst case can have as much as double
cost than a plan for DELETE with LIMIT would have.

With the LIMIT allowed on DELETE it would be something like:

               ->  Limit  (cost=0.00..2165.66 rows=29 width=16)
                     ->  Index Scan using idx_big_table_col3 on
big_table  (cost=0.00..2165.66 rows=29 width=16)
                           Index Cond: (col1 = $2)
                           Filter: ((col3 IS NOT NULL) AND (col4 = $3)
AND ("col5" <> 'o'::bpchar))


That would spare a HashAggregate and an index scan. The index scan would
very likely not be a problem, as the same index entries are visited in
the subquery and likely are cached, and the HashAggregate should be also
fast for the max 1000 rows it has to handle, but they are still
completely unnecessary for my purpose, so I still think the LIMIT on
DELETE and UPDATE would make perfect sense.

Cheers,
Csaba.




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux