Search Postgresql Archives

Re: Deletion Challenge

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The two general solutions are the "keep the last one" proposed by Adrian "keep the last N" that I sent.

But it might be worth stepping back a bit. You said you are having performance problems that you feel would be improved by removing only a million rows which doesn't sound like that much to me. It's less than half of what I *add* to just one of my tables every week and my database is dwarfed by those of many of the participants on this list.

This suggests that there may be other issues such as tuning, indexing or query optimization at play. Depending on your requirements, partitioning might be useful. It wouldn't be last N but could easily be done to partition by date-ranges which makes archiving and purging a low-cost operation.

You might want to expand a bit on the core issue you are trying to solve.

Cheers,
Steve


On Wed, Dec 9, 2015 at 12:43 PM, David G. Johnston <david.g.johnston@xxxxxxxxx> wrote:
On Wed, Dec 9, 2015 at 1:31 PM, Adrian Klaver <adrian.klaver@xxxxxxxxxxx> wrote:
On 12/09/2015 12:24 AM, Berend Tober wrote:
Adrian Klaver wrote:
On 12/05/2015 08:08 AM, Berend Tober wrote:
/*

Deletion Challenge

I want to delete all but the most recent transaction, per person, from a
table that records a transaction history because at some point the
transaction history grows large enough to adversely effect performance,
and also becomes less relevant for retention.

...


test=> delete from cash_journal where ARRAY[click, cash_journal_id]
NOT in (select max(ARRAY[click,cash_journal_id]) from cash_journal
group by fairian_id);
DELETE 7

test=> SELECT * FROM cash_journal order by fairian_id, click,
cash_journal_id;
  click | cash_journal_id | fairian_id | debit | credit | balance
|           description
-------+-----------------+------------+-------+--------+---------+----------------------------------

    412 |               1 |          7 |     5 |        |      14 |
Sold food quantity 7 units.
     37 |               7 |          8 |     8 |        |       8 |
Ratified contract f1abd670358e03
     37 |               9 |          9 |     7 |        |       7 |
Ratified contract 1574bddb75c78a
     36 |              14 |         18 |     0 |      0 |       0 |
initial cash balance
    413 |               1 |         25 |       |    995 |       0 |
Redeemed bond 7719a1c782a1ba
(5 rows)


Nice.

The idea of a NOT IN query had occurred to me briefly, but I failed to
pursue it because at some point in the distant past I had gained the
impression that NOT IN queries were not computationally efficient.
During one round of testing I had like a million rows. I'll have to run
some EXPLAIN query testing with a larger data sample for comparison.
Thanks!

Plan B:

WITH d AS
    (SELECT * FROM
        cash_journal
    LEFT JOIN
        (SELECT
            MAX(ARRAY[click,cash_journal_id]) AS mx
        FROM
            cash_journal
        GROUP BY
            fairian_id)
        AS
            mxa
    ON
        mxa.mx=ARRAY[click, cash_journal_id]
    WHERE
        mx IS NULL)
DELETE FROM
    cash_journal
USING
    d
WHERE
    d.click = cash_journal.click
AND
    d.cash_journal_id = cash_journal.cash_journal_id;


​Couldn't the LEFT JOIN relation in the CTE be better written using "SELECT DISTINCT ON (click, cash_journal_id) click, cash_journal_id [...] ORDER BY click DESC, cash_journal_id" or something similar?  It doesn't seem like you should need to introduce an array and an aggregate here.

​It does have the negative property of only providing a single row; which excludes using it for the "last 5" part but I suspect it will be considerably faster for the single version.

David J.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux