Re: looking for a globally unique row ID

Tomas Vondra <tomas.vondra@xxxxxxxxxxxxxxx> · Sat, 16 Sep 2017 12:43:27 +0200

On 09/16/2017 11:57 AM, Rafal Pietrak wrote:
> 
> <disclaimer>
> 	I'M NOT POSTGRES HACKER. THIS IS JUST NA INTUITION.
> </disclaimer>
> 
> W dniu 15.09.2017 o 21:30, Christopher Browne pisze:
>> On 15 September 2017 at 14:45, Adam Brusselback
>> <adambrusselback@xxxxxxxxx> wrote:
> 
> [-----------]
>>
>> b) Referencing which index entries can be dropped (e.g. - vacuumed out)
>> is a fair bit more complex because the index entries depend on multiple
>> tables.  This adds, um, concurrency complications, if data is being deleted
>> from multiple tables concurrently.  (Over-simplifying question:  "When
>> a table that participates in the sharing is vacuumed, does the shared
>> index get vacuumed?  What if two such tables are vacuumed concurrently?")
> 
> This is probably postgresql-hackers knowledge, but I'd appreciate if
> you elaborate: why "concurrent vacuum" of two table with common index
> is such a problem?
> 

Because it means vacuums on partitions will clash on the shared (global)
index, unlike today with only local indexes. Vacuum happens in about
three basic phases:

1) Remove dead tuples from tables, reduce them to item pointers and
stash the row IDs in a list.

2) Scan all indexes on the table, remove items with IDs on the list.

3) Scan the table again, finally removing the item pointers.

Currently this happens on partition level, so the individual vacuums
don't interfere by modifying the same index. With global indexes (at
least when implemented as a single object) that would no longer be the
case, and the consequences are a bit terrifying ...

> 1. why cannot it be "serialized on demand" in such case/exception
> (e.i the case of tables being bond by a common index)? In other
> words, can the initial concurrency be turned into serialized
> commands?
> 

It can. But serialization is pretty much the arch enemy of scalability.
Amdahl's law and all that ...

What's particularly bad on the vacuum serialization is that it does not
serialize client requests but maintenance tasks. That's quite dangerous
IMNSHO, as it may easily result in bloat explosion.

> 2. why common index cannot be implemented as "split files" - locking 
> with their respective tables the usual way? The problem of
> concurrent locking would vanish at the expense of performance hit ...
> but if someone desperately needs such "global index", I'd bet he/she
> will live with performance hit. I would.
> 

It can be implemented by split files. But that once again introduces
massive locking (imagine hundreds or thousands of partitions), and thus
overhead.

It's a bit foolish to say you would live with the performance hit when
you don't know how serious would it be. It may be fine but it may also
be much worse than you expected.

I ran into a number of people who used global indexes on other DBs, and
their position is mostly "I'll never use that, because it pretty much
eliminates all benefits of partitioning." So we need to thread pretty
carefully here, not to run into the same limitations.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general