Hi all,
I'm using PostgreSQL 9.3.2 on x86_64-unknown-linux-gnu
I have a schema where I have lots of messages and some users who might have read some of them. When a message is read by a user I create an entry i a table message_property holding the property (is_read) for that user.
The schema is as follows:
drop table if exists message_property;
drop table if exists message;
drop table if exists person;
drop table if exists message;
drop table if exists person;
create table person(
id serial primary key,
username varchar not null unique
);
id serial primary key,
username varchar not null unique
);
create table message(
id serial primary key,
subject varchar
);
id serial primary key,
subject varchar
);
create table message_property(
message_id integer not null references message(id),
person_id integer not null references person(id),
is_read boolean not null default false,
unique(message_id, person_id)
);
message_id integer not null references message(id),
person_id integer not null references person(id),
is_read boolean not null default false,
unique(message_id, person_id)
);
insert into person(username) values('user_' || generate_series(0, 999));
insert into message(subject) values('Subject ' || random() || generate_series(0, 999999));
insert into message_property(message_id, person_id, is_read) select id, 1, true from message order by id limit 999990;
insert into message_property(message_id, person_id, is_read) select id, 1, false from message order by id limit 5 offset 999990;
insert into message(subject) values('Subject ' || random() || generate_series(0, 999999));
insert into message_property(message_id, person_id, is_read) select id, 1, true from message order by id limit 999990;
insert into message_property(message_id, person_id, is_read) select id, 1, false from message order by id limit 5 offset 999990;
analyze;
So, for person 1 there are 10 unread messages, out of a total 1mill. 5 of those unread does not have an entry in message_property and 5 have an entry and is_read set to FALSE.
I have the following query to list all un-read messages for person with id=1:
SELECT
m.id AS message_id,
prop.person_id,
coalesce(prop.is_read, FALSE) AS is_read,
m.subject
FROM message m
LEFT OUTER JOIN message_property prop ON prop.message_id = m.id AND prop.person_id = 1
WHERE 1 = 1
AND NOT EXISTS(SELECT
*
FROM message_property pr
WHERE pr.message_id = m.id AND pr.person_id = prop.person_id AND prop.is_read = TRUE)
;
m.id AS message_id,
prop.person_id,
coalesce(prop.is_read, FALSE) AS is_read,
m.subject
FROM message m
LEFT OUTER JOIN message_property prop ON prop.message_id = m.id AND prop.person_id = 1
WHERE 1 = 1
AND NOT EXISTS(SELECT
*
FROM message_property pr
WHERE pr.message_id = m.id AND pr.person_id = prop.person_id AND prop.is_read = TRUE)
;
The problem is that it's not quite efficient and performs badly, explain analyze shows:
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Merge Anti Join (cost=1.27..148784.09 rows=5 width=40) (actual time=918.906..918.913 rows=10 loops=1)
Merge Cond: (m.id = pr.message_id)
Join Filter: (prop.is_read AND (pr.person_id = prop.person_id))
Rows Removed by Join Filter: 5
-> Merge Left Join (cost=0.85..90300.76 rows=1000000 width=40) (actual time=0.040..530.748 rows=1000000 loops=1)
Merge Cond: (m.id = prop.message_id)
-> Index Scan using message_pkey on message m (cost=0.42..34317.43 rows=1000000 width=35) (actual time=0.014..115.829 rows=1000000 loops=1)
-> Index Scan using message_property_message_id_person_id_key on message_property prop (cost=0.42..40983.40 rows=999995 width=9) (actual time=0.020..130.728 rows=999995 loops=1)
Index Cond: (person_id = 1)
-> Index Only Scan using message_property_message_id_person_id_key on message_property pr (cost=0.42..40983.40 rows=999995 width=8) (actual time=0.024..140.349 rows=999995 loops=1)
Index Cond: (person_id = 1)
Heap Fetches: 999995
Total runtime: 918.975 ms
(13 rows)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Merge Anti Join (cost=1.27..148784.09 rows=5 width=40) (actual time=918.906..918.913 rows=10 loops=1)
Merge Cond: (m.id = pr.message_id)
Join Filter: (prop.is_read AND (pr.person_id = prop.person_id))
Rows Removed by Join Filter: 5
-> Merge Left Join (cost=0.85..90300.76 rows=1000000 width=40) (actual time=0.040..530.748 rows=1000000 loops=1)
Merge Cond: (m.id = prop.message_id)
-> Index Scan using message_pkey on message m (cost=0.42..34317.43 rows=1000000 width=35) (actual time=0.014..115.829 rows=1000000 loops=1)
-> Index Scan using message_property_message_id_person_id_key on message_property prop (cost=0.42..40983.40 rows=999995 width=9) (actual time=0.020..130.728 rows=999995 loops=1)
Index Cond: (person_id = 1)
-> Index Only Scan using message_property_message_id_person_id_key on message_property pr (cost=0.42..40983.40 rows=999995 width=8) (actual time=0.024..140.349 rows=999995 loops=1)
Index Cond: (person_id = 1)
Heap Fetches: 999995
Total runtime: 918.975 ms
(13 rows)
Does anyone have suggestions on how to optimize the query or schema? It's important that any message not having an entry in message_property for a user is considered un-read.
Thanks!
--
Andreas Jospeh Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963