Question on session_replication_role

"Anand Kumar, Karthik" <Karthik.AnandKumar@xxxxxxxxxxxxxx> · Tue, 3 Feb 2015 02:07:33 +0000

Our set up:

*         Db version: postgres 9.3.4

*         OS: CentOS 5.6

*         kernel Version - Linux 2.6.18-238.19.1.el5 #1 SMP Fri Jul 15 07:31:24 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux

*         memory - 256 GB

*         We have slony replicating to this DB from the primary server

*         Some of the DB parameters:

max_connections  - 1500

shared_buffers - 8GB

work_mem = 256MB

maintenance_work_mem = 1GB

Problem statement:

On Jan 27^th 2015:

We inserted 77k records to a table with 780 million records by disabling triggers using SET session_replication_role = replica;

Table definition:

site=# \d reg_email_subscriptions

       Table "emailsubscription.reg_email_subscriptions"

      Column      |              Type              | Modifiers

------------------+--------------------------------+-----------

registration_id  | bigint                         | not null

subscription_id  | bigint                         | not null

creation_date    | timestamp(0) without time zone | not null

last_update_date | timestamp(0) without time zone | not null

Number of child tables: 8 (Use \d+ to list them.)

Tablespace: "emailsubscription"

Indexes/constraints on each of the 8 partitions:

    "reg_email_subscriptions_p00_pkey" PRIMARY KEY, btree (registration_id, subscription_id), tablespace "emailsubscription"

    "reg_email_subscriptions_p00_n1" btree (subscription_id), tablespace "emailsubscription"

    "reg_email_subscriptions_p00_n20" btree (last_update_date), tablespace "emailsubscription"
    "reg_email_subscriptions_p00_n3" btree (creation_date), tablespace “emailsubscription"

Soon after learning that using this setting also disables PK on the table, we manually deleted 117 duplicate records that got inserted.

On Jan 28^th 2015:

The query that was taking 7 secs started taking over 30 seconds.

SELECT DISTINCT it.recipient_id  FROM  iru.iru_tags it WHERE it.recipient_id BETWEEN 758587587 and 968696896 AND   it.status = 0 AND   it.last_update_date >= date_trunc('day', now() - interval '90 days')
 AND EXISTS (SELECT DISTINCT res.registration_id FROM emailsubscription.reg_email_subscriptions res WHERE res.registration_id = it.recipient_id AND res.subscription_id = 200400);

Table definition of the other table involved in the query (iru.iru_tags): - No manual DML was done on this table.

site=# \d iru_tags

                    Table "iru.iru_tags"

      Column      |            Type             | Modifiers

------------------+-----------------------------+-----------

recipient_id     | bigint                      | not null

tagger_id        | bigint                      | not null

tag_id           | integer                     | not null

batch_id         | integer                     | not null

status           | integer                     |

creation_date    | timestamp without time zone | not null

last_update_date | timestamp without time zone | not null

Indexes:

    "iru_tags_pk" PRIMARY KEY, btree (recipient_id, tagger_id, batch_id, tag_id)

    "iru_tags_n1" btree (recipient_id, tag_id)

    "iru_tags_n2" btree (last_update_date)

    "iru_tags_n3" btree (creation_date)

    "iru_tags_n31" btree (status, recipient_id, last_update_date)

    "iru_tags_n4" btree (tagger_id)

Observation:

1)      The query was taking 2-7 seconds consistently until the day before we inserted records by disabling the triggers.

2)      When we run the selects on each table separately, the query runs really fast. The moment we introduce the join (AND EXISTS), the sql takes over 30 seconds.

3)      The explain plan of this query shows that Primary key on reg_email_subscriptions and unique index on iru_tags table is being used.

site=# explain(analyze on, verbose on, costs on, buffers on, timing on) SELECT DISTINCT it.recipient_id  FROM  iru.iru_tags it WHERE it.recipient_id BETWEEN 758587587 and 968696896 AND   it.status =
 0 AND   it.last_update_date >= date_trunc('day', now() - interval '90 days') AND EXISTS (SELECT DISTINCT res.registration_id FROM emailsubscription.reg_email_subscriptions res WHERE res.registration_id = it.recipient_id AND res.subscription_id = 200400);

                                                                                              QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Unique  (cost=0.57..290191.08 rows=45 width=8) (actual time=89.536..89.536 rows=0 loops=1)

   Output: it.recipient_id

   Buffers: shared hit=9 read=1

   I/O Timings: read=47.097

   ->  Nested Loop Semi Join  (cost=0.57..290187.87 rows=1286 width=8) (actual time=89.534..89.534 rows=0 loops=1)

         Output: it.recipient_id

         Buffers: shared hit=9 read=1

         I/O Timings: read=47.097

         ->  Index Only Scan using iru_tags_n31 on iru.iru_tags it  (cost=0.57..319.84 rows=1286 width=8) (actual time=89.532..89.532 rows=0 loops=1)

               Output: it.status, it.recipient_id, it.last_update_date

               Index Cond: ((it.status = 0) AND (it.recipient_id >= 758587587) AND (it.recipient_id <= 968696896) AND (it.last_update_date >= date_trunc('day'::text, (now() - '90 days'::interval))))

               Heap Fetches: 0

               Buffers: shared hit=9 read=1

               I/O Timings: read=47.097

         ->  Append  (cost=0.00..1780.03 rows=17 width=8) (never executed)

               ->  Seq Scan on emailsubscription.reg_email_subscriptions res  (cost=0.00..1728.07 rows=1 width=8) (never executed)

                     Output: res.registration_id

                     Filter: ((res.subscription_id = 200400) AND (it.recipient_id = res.registration_id))

               ->  Index Only Scan using reg_email_subscriptions_p00_pkey on emailsubscription.reg_email_subscriptions_p00 res_1  (cost=0.57..6.46 rows=2 width=8) (never executed)

                     Output: res_1.registration_id

                     Index Cond: ((res_1.registration_id = it.recipient_id) AND (res_1.subscription_id = 200400))

                     Heap Fetches: 0

               ->  Index Only Scan using reg_email_subscriptions_p01_pkey on emailsubscription.reg_email_subscriptions_p01 res_2  (cost=0.57..6.53 rows=2 width=8) (never executed)

                     Output: res_2.registration_id

                     Index Cond: ((res_2.registration_id = it.recipient_id) AND (res_2.subscription_id = 200400))

                     Heap Fetches: 0

               ->  Index Only Scan using reg_email_subscriptions_p02_pkey on emailsubscription.reg_email_subscriptions_p02 res_3  (cost=0.57..6.46 rows=2 width=8) (never executed)

                     Output: res_3.registration_id

                     Index Cond: ((res_3.registration_id = it.recipient_id) AND (res_3.subscription_id = 200400))

                     Heap Fetches: 0

               ->  Index Only Scan using reg_email_subscriptions_p03_pkey on emailsubscription.reg_email_subscriptions_p03 res_4  (cost=0.57..6.53 rows=2 width=8) (never executed)

                     Output: res_4.registration_id

                     Index Cond: ((res_4.registration_id = it.recipient_id) AND (res_4.subscription_id = 200400))

                     Heap Fetches: 0

               ->  Index Only Scan using reg_email_subscriptions_p04_pkey on emailsubscription.reg_email_subscriptions_p04 res_5  (cost=0.57..6.46 rows=2 width=8) (never executed)

                     Output: res_5.registration_id

                     Index Cond: ((res_5.registration_id = it.recipient_id) AND (res_5.subscription_id = 200400))

                     Heap Fetches: 0

               ->  Index Only Scan using reg_email_subscriptions_p05_pkey on emailsubscription.reg_email_subscriptions_p05 res_6  (cost=0.57..6.53 rows=2 width=8) (never executed)

                     Output: res_6.registration_id

                     Index Cond: ((res_6.registration_id = it.recipient_id) AND (res_6.subscription_id = 200400))

                     Heap Fetches: 0

               ->  Index Only Scan using reg_email_subscriptions_p06_pkey on emailsubscription.reg_email_subscriptions_p06 res_7  (cost=0.57..6.46 rows=2 width=8) (never executed)

                     Output: res_7.registration_id

                     Index Cond: ((res_7.registration_id = it.recipient_id) AND (res_7.subscription_id = 200400))

                     Heap Fetches: 0

               ->  Index Only Scan using reg_email_subscriptions_p07_pkey on emailsubscription.reg_email_subscriptions_p07 res_8  (cost=0.57..6.53 rows=2 width=8) (never executed)

                     Output: res_8.registration_id

                     Index Cond: ((res_8.registration_id = it.recipient_id) AND (res_8.subscription_id = 200400))

                     Heap Fetches: 0

Total runtime: 90.314 ms

(51 rows)

4)      We have done updates on various tables after setting session_replication_role to replica without any performance issues in the past.

What was done so far:

1)      Full Vacuum on reg_email_subsciptions table

2)      Full vacuum on iru_tags table

3)      Manual reindexing on all 8 primary keys  on reg_email_subsciptions_p0* tables

Questions:

1)      Did setting session_replication_role to replica before inserting (duplicate) records corrupt the primary key or any of the indexes on reg_email_subscriptions table?

2)      If so, should a reindex on the corrupt PK or index identify the corruption and report/fix it?

3)      Is this even a problem with index/data corruption?

Thank You for your help