Hello List,
i am playing with the idea to implement a job queuing system using PostgreSQL. To meet requirements the system needs to offer some advanced
features compared to "classic" queuing systems:
- users can create new queues at any time
- queues have priorities
- priorities of queues can change at any time
- jobs in queues with the highest priority should be processed first
A simple schema could look like this:
create table queues (
id integer primary key,
priority integer not null default 0
);
create table jobs (
id serial primary key,
queue_id integer not null references queues(id)
);
Right now i am stuck writing an efficient query for dequeuing. This is what i got so far:
insert into queues (id, priority) values (1, 0), (2, 1), (3, 1);
insert into jobs (queue_id) select 1 from generate_series(1, 1000000);
insert into jobs (queue_id) select 2 from generate_series(1, 1000000);
insert into jobs (queue_id) select 3 from generate_series(1, 1000000);
Here is a simple dequeuing query that is super efficient, but ignores priority:
select * from jobs limit 1 for update of jobs skip locked;
-- id | queue_id
-- ----+----------
-- 1 | 1
-- (1 row)
--
-- Time: 2.641 ms
This is my naive query obeying priority. This is very slow and i could not get it any faster:
select *
from jobs
join queues on queues.id = jobs.queue_id
order by priority desc
limit 1
for update of jobs skip locked;
-- id | queue_id | id | priority
-- ---------+----------+----+----------
-- 1000001 | 2 | 2 | 1
-- (1 row)
--
-- Time: 1116.631 ms (00:01.117)
Here is the query plan:
-- QUERY PLAN
--
-----------------------------------------------------------------------------------------------------------------------------------------
-- Limit (cost=66225.61..66225.63 rows=1 width=28) (actual time=1333.139..1333.142 rows=1 loops=1)
-- -> LockRows (cost=66225.61..103725.61 rows=3000000 width=28) (actual time=1333.137..1333.139 rows=1 loops=1)
-- -> Sort (cost=66225.61..73725.61 rows=3000000 width=28) (actual time=1333.110..1333.112 rows=1 loops=1)
-- Sort Key: queues.priority DESC
-- Sort Method: external merge Disk: 111568kB
-- -> Hash Join (cost=60.85..51225.61 rows=3000000 width=28) (actual time=0.064..660.317 rows=3000000 loops=1)
-- Hash Cond: (jobs.queue_id = queues.id)
-- -> Seq Scan on jobs (cost=0.00..43275.00 rows=3000000 width=14) (actual time=0.027..253.868 rows=3000000 loops=1)
-- -> Hash (cost=32.60..32.60 rows=2260 width=14) (actual time=0.021..0.022 rows=3 loops=1)
-- Buckets: 4096 Batches: 1 Memory Usage: 33kB
-- -> Seq Scan on queues (cost=0.00..32.60 rows=2260 width=14) (actual time=0.011..0.013 rows=3 loops=1)
-- Planning Time: 0.430 ms
-- Execution Time: 1347.448 ms
-- (13 rows)
Any ideas for a more efficient implementation?
Thank you for your time,
ushi