Re: zabbix on postgresql - very slow delete of events

Kristian Ejvind <Kristian.Ejvind@xxxxxxxxx> · Wed, 24 Jul 2019 15:12:18 +0000

Hi Maxim

Thanks for your advice, and let me start with your second email, which I'll copy here:

=====
Hi Kristian,

After comparing structure of zabbix tables with same in my zabbix installation I found one very weird difference.
Why type of events.eventid had been changed from default bigint to numeric?

I suspect that the difference between events.eventid (numeric) type and event_recovery.*_eventid (bigint) types might lead to inability of use index
 during foreign key checks.
Anyway it will be clearly visible on the pg_stat_xact_user_tables results (I now expect to see 3 sequential scan on event_recovery and may be on
 some other tables as well).

Kind Regards,
Maxim
=====

Well spotted! On closer examination it seems that data types are wrong in several places. I suspect that this comes
from the time when our Zabbix ran on a MySQL database, which was converted over to PostgreSQL a few years
ago. I agree this discrepancy is suspicious and I will continue to examine it.

Regarding your ideas in the email below, I can say that 1) is not valid, disk latency is in the range of a few ms.

This is the output from your recommended query, which seems to verify your suspicions.  

zabbix_34=# begin; delete from zabbix.events where eventid = 7123123; select * from pg_stat_xact_user_tables where seq_scan>0 or idx_scan>0 order by seq_scan+idx_scan
 desc; rollback;
Time: 0.113 ms
Time: 4798.189 ms (00:04.798)
relid  | schemaname |    relname     | seq_scan | seq_tup_read | idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd
--------+------------+----------------+----------+--------------+----------+---------------+-----------+-----------+-----------+---------------
  41940 | zabbix     | event_recovery |        3 |     35495224 |        0 |             0 |         0 |         0 |         1 |             0
  41675 | zabbix     | alerts         |        1 |       544966 |        1 |             0 |         0 |         0 |         0 |             0
  42573 | zabbix     | problem        |        2 |        13896 |        0 |             0 |         0 |         0 |         0 |             0
  41943 | zabbix     | event_tag      |        1 |        22004 |        0 |             0 |         0 |         0 |         0 |             0
  41649 | zabbix     | acknowledges   |        1 |           47 |        0 |             0 |         0 |         0 |         0 |             0
  41951 | zabbix     | events         |        0 |            0 |        1 |             1 |         0 |         0 |         1 |             0
260215 | zabbix     | event_suppress |        1 |            0 |        0 |             0 |         0 |         0 |         0 |             0
(7 rows)

Time: 2.857 ms
Time: 0.162 ms

Regards
Kristian

Kristian Ejvind

Linux System Administrator

IT Operations | Technical Operations

Resurs Bank

Ekslingan 8

Box 222 09,
SE-25467
Helsingborg

Mobil:

+46 728571483

Växel:

+46 42
 38 20 00

E-post:
Kristian.Ejvind@xxxxxxxxx

Webb:
www.resursbank.se

From:
Maxim Boguk <maxim.boguk@xxxxxxxxx>

Date: Wednesday, 24 July 2019 at 15:55

To: Kristian Ejvind <Kristian.Ejvind@xxxxxxxxx>

Cc: "pgsql-performance@xxxxxxxxxxxxxxxxxxxx" <pgsql-performance@xxxxxxxxxxxxxxxxxxxx>

Subject: Re: zabbix on postgresql - very slow delete of events

Hi Kristian,

If you look for explain analyze results for delete,

you will see that 99% of time query spent on the foreign key triggers checks.

In the same time the database have indexes on foreign key side in place.

I recommend try this:

\timing on

BEGIN;

delete from zabbix.events where eventid = [some testing id];

select * from pg_stat_xact_user_tables where seq_scan>0 or idx_scan>0 order by seq_scan+idx_scan desc;

ABORT;

And provide result of the last query and how long delete runs.

It might help us understand whats going on.

Currently I have 3 ideas:

1)very very slow and overloaded IO subsystem

2)a lot of stuff being delete by ON DELETE CASCADE 

3)some locking prevent foreign key checks run fast

On Wed, Jul 24, 2019 at 11:12 AM Kristian Ejvind <Kristian.Ejvind@xxxxxxxxx> wrote:

Hi.

Well, the events table has both a primary key and foreign keys referencing it, which is not possible

on a partitioned table in postgresql 10. How did you work around this issue?

On the other hand, if we can get the deletion of rows from the events table run at normal speed, I

can't imagine we would have a problem with it in a long time. After all, although our Zabbix installation

definitely is larger than "small", it's still far from "large".

I think I would need assistance with debugging why postgresql behaves like it does.

Is there a defect with deleting data from a table that has multiple foreign keys referencing it from a  certain table?

Is there a problem with the query optimizer that chooses the wrong plan when working on the foreign key constraints?

How do I inspect how the db works on the deletion of rows from the referencing tables?

Regards

Kristian

?On 2019-07-23, 16:33, "Kenneth Marshall" <ktm@xxxxxxxx> wrote:

    On Tue, Jul 23, 2019 at 01:41:53PM +0000, Kristian Ejvind wrote:

    > Thanks Kenneth. In fact we've already partitioned the largest history* and trends* tables

    > and that has been running fine for a year. Performance was vastly improved. But since you

    > can't have a unique index on a partitioned table in postgres 10, we haven't worked on that.

    >

    > Regards

    > Kristian

    Hi Kristian,

    Why are you not partitioning the events and alerts tables as well? That

    would eliminate this problem and you already have the infrastructure in

    place to support the management since you are using it for the history

    and trends tables.

    Regards,

    Ken

Resurs Bank AB

Kristian Ejvind

Linux System Administrator

IT Operations | Technical Operations

Ekslingan 8

Box 222 09, SE-25467 Helsingborg

Direkt Tfn:

Mobil: +46 728571483

Vxl: +46 42 382000

Fax:

E-post: Kristian.Ejvind@xxxxxxxxx

Webb: http://www.resursbank.se

-- 

Maxim Boguk

Senior Postgresql DBA

https://dataegret.com/

Phone RU: +7  985 433 0000

Phone UA: +380 99 143 0000

Phone AU: +61  45 218 5678

LinkedIn: 
http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b

Skype: maxim.boguk

"??????, ?? ??? ?????????? ??? ?? ??????, ?? ?????? ??? ??-???????? ?????? ????? ? ??? ????? ??? ????"