Partitioning update-heavy queue with hash partitions vs partial indexes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi list,

I have a queue table with the schema:

```
create table queue(id bigserial primary key, view_time timestamp with timezone not null);
create index queue_view_time ON queue(view_time ASC);
```

The most concurrent operation is:
```
UPDATE queue SET view_time=view_time+INTERVAL '60 seconds' WHERE id=(
    SELECT id FROM queue WHERE view_time<=now() at time zone 'utc' ORDER BY view_time ASC LIMIT 1 FOR UPDATE SKIP LOCKED
)
```
As you can imagine, with increased concurrency, this query will have to read & skip a lot of locked+dead index entries, so taking a lot of cpu-time.

I'm assuming 10K+ queries/second will do the update above and actually return a row.
You may think about how you'll maintain 10K connections, but you can increase the limit, the queries being fast, use a connection pooler, use auto-commit, etc.

--------------

Since most of the overhead is in the `queue_view_time` index, I thought of partitioning just that with partial indexes and then querying the indexes randomly. This is with 2 partitions:

```
create index queue_view_time_0 ON queue(view_time ASC) WHERE id%2=0;
create index queue_view_time_0 ON queue(view_time ASC) WHERE id%2=1;
```
Adding `where id%2=0` to the select query above and trying the partitions randomly until I get a row or searched all partitions.

----------------
But looking at the docs https://www.postgresql.org/docs/current/indexes-partial.html, it says:

> Do Not Use Partial Indexes as a Substitute for Partitioning
> While a search in this larger index might have to descend through a couple more tree levels than a search in a smaller index, that's almost certainly going to be cheaper than the planner effort needed to select the appropriate one of the partial indexes. The core of the problem is that the system does not understand the relationship among the partial indexes, and will laboriously test each one to see if it's applicable to the current query.

Would this be true in my case too? 

Is it faster for the planner to select a correct partition(hash partitioning on `id` column) instead of a correct partial index like in my case? I don't think I'll need more than ~32 partitions/partial-indexes in an extreme scenario.

Regards,
Dorian

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux