I see others have responded with suggestions to improve query performance,
but one thing I noticed when you gave the data structure is there are no
no primary keys defined for friends or posting, neither are there any indexes.
Was that an omission? If not, then please note that PostgreSQL is a _relational_
database and it is critical to have primary keys and additional indexes for data integrity
and performance.
FYI, defining a foreign key in a table does not automatically generate an associated
index.
I therefore suggest you do the following to improve performance.
ALTER TABLE posting
ADD CONSTRAINT posting_pk PRIMARY KEY ( id );
ALTER TABLE friends
ADD CONSTRAINT friends_pk PRIMARY KEY ( user_id__a, user_id__b );
ALTER TABLE membership
ADD CONSTRAINT membership_pk PRIMARY KEY ( user_id, group_id );
CREATE INDEX posting_group_id__in_idx
on posting
USING BTREE ( group_id__in );
ANALYZE posting;
ANALYZE friends;
ANALYZE membership;
but one thing I noticed when you gave the data structure is there are no
no primary keys defined for friends or posting, neither are there any indexes.
Was that an omission? If not, then please note that PostgreSQL is a _relational_
database and it is critical to have primary keys and additional indexes for data integrity
and performance.
FYI, defining a foreign key in a table does not automatically generate an associated
index.
I therefore suggest you do the following to improve performance.
ALTER TABLE posting
ADD CONSTRAINT posting_pk PRIMARY KEY ( id );
ALTER TABLE friends
ADD CONSTRAINT friends_pk PRIMARY KEY ( user_id__a, user_id__b );
ALTER TABLE membership
ADD CONSTRAINT membership_pk PRIMARY KEY ( user_id, group_id );
CREATE INDEX posting_group_id__in_idx
on posting
USING BTREE ( group_id__in );
ANALYZE posting;
ANALYZE friends;
ANALYZE membership;
On Wed, Apr 29, 2015 at 6:54 AM, Ladislav Lenart <lenartlad@xxxxxxxx> wrote:
Hello.
I think you can propagate ORDER BY and LIMIT also to the subqueries of the
On 29.4.2015 01:57, Jonathan Vanasco wrote:
> Sorry, I was trying to ask something very abstract as I have similar situations
> on multiple groups of queries/tables (and they're all much more complex).
>
> I'm on pg 9.3
>
> The relevant structure is:
>
> posting:
> id
> timestamp_publish
> group_id__in
> user_id__author
>
> friends:
> user_id__a
> user_id__b
>
> memberships:
> user_id
> group_id
> role_id
>
>
> -- working sql
> CREATE TABLE groups(
> id SERIAL NOT NULL PRIMARY KEY
> );
> CREATE TABLE users(
> id SERIAL NOT NULL PRIMARY KEY
> );
> CREATE TABLE friends (
> user_id__a INT NOT NULL REFERENCES users( id ),
> user_id__b INT NOT NULL REFERENCES users( id )
> );
> CREATE TABLE memberships (
> user_id INT NOT NULL REFERENCES users( id ),
> group_id INT NOT NULL REFERENCES groups( id ),
> role_id INT NOT NULL
> );
> CREATE TABLE posting (
> id SERIAL NOT NULL,
> timestamp_publish timestamp not null,
> group_id__in INT NOT NULL REFERENCES groups(id),
> user_id__author INT NOT NULL REFERENCES users(id),
> is_published BOOL
> );
>
> The output that I'm trying to get is:
> posting.id
> {the context of the select}
> posting.timestamp_publish (this may need to get correlated into other queries)
>
>
> These approaches had bad performance:
>
> -- huge selects / memory
> -- it needs to load everything from 2 tables before it limits
> EXPLAIN ANALYZE
> SELECT id, feed_context FROM (
> SELECT id, timestamp_publish, 'in-group' AS feed_context FROM posting
> WHERE (
> group_id__in IN (SELECT group_id FROM memberships WHERE user_id = 57
> AND role_id IN (1,2,3))
> AND (is_published = True AND timestamp_publish <= CURRENT_TIMESTAMP
> AT TIME ZONE 'UTC')
> )
> UNION
> SELECT id, timestamp_publish, 'by-user' AS feed_context FROM posting
> WHERE (
> user_id__author IN (SELECT user_id__b FROM friends WHERE user_id__a
> = 57)
> AND (is_published = True AND timestamp_publish <= CURRENT_TIMESTAMP
> AT TIME ZONE 'UTC')
> )
> ) AS feed
> ORDER BY timestamp_publish DESC
> LIMIT 10
> ;
UNION, i.e.:
select...
from (
(
select...
from posting
where... -- friends
order by timestamp_publish desc
limit 10
) union (
(
select...
from posting
where... -- groups
order by timestamp_publish desc
limit 10
)
) as feed
order by timestamp_publish desc
limit 10
That might behave better.
Ladislav Lenart
> -- selects minimized, but repetitive subqueries
> SELECT
> id,
> CASE
> WHEN group_id__in IN (SELECT group_id FROM memberships WHERE user_id
> = 57 AND role_id IN (1,2,3)) THEN True
> ELSE NULL
> END AS feed_context_group,
> CASE
> WHEN user_id__author IN (SELECT user_id__b FROM friends WHERE
> user_id__a = 57) THEN True
> ELSE NULL
> END AS feed_context_user
> FROM posting
> WHERE (
> group_id__in IN (SELECT group_id FROM memberships WHERE user_id = 57
> AND role_id IN (1,2,3))
> OR
> user_id__author IN (SELECT user_id__b FROM friends WHERE user_id__a
> = 57)
> )
> AND (is_published = True AND timestamp_publish <= CURRENT_TIMESTAMP AT
> TIME ZONE 'UTC')
> ORDER BY timestamp_publish DESC
> LIMIT 10
> ;
>
>
>
> On Apr 28, 2015, at 6:56 PM, Melvin Davidson wrote:
>
>> Since you very nicely DID NOT provide the pg version, O/S or table
>> structure(s), which is what you should do REGARDLESS of the
>> type of question (it's just the smart and polite thing to do when asking for
>> help) The best I can suggest is:
>> SELECT
>> CASE WHEN context = 'friend' THEN p.junka
>> WHEN context = 'group' THEN p.junkb
>> WHEN context = 'both' THEN p.junka || ' ' || p.junkb
>> END
>> FROM posting p
>> where p.author_id in (SELECT f.friend_id
>> FROM friends f
>> WHERE f.user_id = ?)
>> OR p.group_id in (SELECT m.group_id
>> FROM memberships m
>> WHERE m.user_id = ?);
>
>
--
Melvin Davidson
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.