2014-08-11 22:48 GMT+02:00 Kevin Grittner <kgrittn@xxxxxxxxx>:
Rémi Cura <remi.cura@xxxxxxxxx> wrote:This is too vague to comment on.
> as you (both?) suggested it works using advisory lock used at the
> beginning and end of the transaction. This way there is no upsert
> at all if the element is locked? (I used general advisory lockbut
> in the same way as transactionnal lock)
... and that doesn't seem to make any sense. Either you are not
> The issue in this case is simple : I have to use about 100k
> advisory locks, which is a big memory requirement for my hardware
> :-(
understanding advisory locks or you are doing something very, very
unusual.
See the LOCK TABLE command.
> Merlin I'm afraid I don't understand what is "vanilla" LOCK
> TABLE.
http://www.postgresql.org/docs/current/interactive/sql-lock.html
http://www.postgresql.org/docs/current/interactive/explicit-locking.html
Now *that* I understand. :-) It's not an unusual requirement,
> I can't really use a lock table because each query upsert
> sequentially into 3 tables, doing lots of computing between.
but can be a challenge when using snapshot isolation (where writes
don't block reads and reads don't block anything). There are two
main approaches -- introduce blocking to serialize some of the
operations, or use the SERIALIZABLE transaction isolation level to
ensure that the behavior of all concurrent transactions is
consistent with the behavior you would see if they were run one at
a time. The latter approach doesn't introduce any new blocking,
but it can cause transactions to get an ERROR with a SQLSTATE of
40001 at just about any point, so you need to be prepared to
recognize that and retry those transactions from the beginning (not
just the last statement of the transaction), ignoring any data read
during the failed attempt.
You may want to read the entire chapter on concurrency control:
http://www.postgresql.org/docs/current/interactive/mvcc.html
If you are considering using SERIALIZABLE transactions, you should
probably review the examples in the Wiki, to get an idea of how it
behaves in various cases:
http://wiki.postgresql.org/wiki/SSI
Well, if you introduce blocking you reduce your parallelism, but if
> I use parallel query to compute faster (load dividing). I guess
> it would be very slow with about 8 parallel queries with locks.
you use serializable transactions and there are actually a lot of
conflicts you can see poor performance because of the errors
rolling back transactions and the need to retry them from the
start. The techniques used to implement serializable transactions
in PostgreSQL are basically a refinement of the Optimistic
Concurrency Control (OCC) techniques, but generally with far fewer
retries needed -- the point being that it optimistically assumes
that there will not be a conflict so that concurrency is better,
but has to cancel things if that optimism proves to be unfounded.
To make related to changes to multiple tables and maintain coherent
data, you probably will need to do one or the other.
Hey, thanks for your detailed answer.
The particularity here is that I use a big query with CTE instead of a more procedural way.
I do sophisticated geometric computing using postGIS. I guess it is a hack of both postgres and postgis.
I do sophisticated geometric computing using postGIS. I guess it is a hack of both postgres and postgis.
I explain better the pg_advisory locks uses I have tried.
First classic use of pg_advisory, not working :
CTE_1 (find what rows will be upserted in table_1)
CTE_2 (find what rows will be upserted in table_2)
CTE_3 (find what rows will be upserted in table_3)
CTE_3 (find what rows will be upserted in table_3)
CTE_4 (compute the result to be upserted into table_1)
CTE_5 (upsert into table_1 using custom upsert_function)
CTE_6 (compute the result to be upserted into table_2)
CTE_7 (upsert into table_2 using custom upsert_function)
CTE_8 (compute the result to be upserted into table_2)
CTE_9 (upsert into table_2 using custom upsert_function)
CTE_5 (upsert into table_1 using custom upsert_function)
CTE_6 (compute the result to be upserted into table_2)
CTE_7 (upsert into table_2 using custom upsert_function)
CTE_8 (compute the result to be upserted into table_2)
CTE_9 (upsert into table_2 using custom upsert_function)
CTE_10 (end of query)
each of the upserting function is plpgsql and do something like
pg_advisory_lock(table_number, id of row to be upserted)
with updating AS (update table)
insert into table if not updated
pg_advisory_unlock(table_number,id of row to be upserted)
According to what the list said, it doesn't work because of visibility issues : the locking work, so we know each processes will upsert the same thing sequentially. However it will fail because each process has no visibility on the insert done by the others. So when the second process upsert the same thing, it will try to insert , and we get 2 inserts for the same rowSecond non classic use of pg_adivsory, working but too taxing on resources
The difference between the 2 is essentially : in first case we lock each individual upsert statement. In second case we lock at the query level.
CTE_1 (find what rows will be upserted in table_1, pg_try_advisory_lock(1,id1) )
CTE_2 (find what rows will be upserted in table_2, pg_try_advisory_lock(2,id2) )
CTE_3 (find what rows will be upserted in table_3, pg_try_advisory_lock(3,id3) )
CTE_3 (find what rows will be upserted in table_3, pg_try_advisory_lock(3,id3) )
CTE_4 (compute the result to be upserted into table_1)
CTE_5 (upsert into table_1 using custom upsert_function, only if pg_try_advisory_lock was true )
CTE_6 (compute the result to be upserted into table_2)
CTE_7 (upsert into table_2 using custom upsert_function, only if pg_try_advisory_lock was true)
CTE_8 (compute the result to be upserted into table_2)
CTE_9 (upsert into table_2 using custom upsert_function, only if pg_try_advisory_lock was true)
CTE_5 (upsert into table_1 using custom upsert_function, only if pg_try_advisory_lock was true )
CTE_6 (compute the result to be upserted into table_2)
CTE_7 (upsert into table_2 using custom upsert_function, only if pg_try_advisory_lock was true)
CTE_8 (compute the result to be upserted into table_2)
CTE_9 (upsert into table_2 using custom upsert_function, only if pg_try_advisory_lock was true)
CTE_10 (release locks for (1,id1) , (2,id2) , (3,id3)) )
CTE_10 (end of query)
CTE_10 (end of query)
This works because we don't even try to upsert a row that is already being taken care by another process. So there is no visibility issue;
The problem is with ressources, I have to use 100k locks for several minutes. It grezatly increase memory use.
On the bright side, I don't need to wait for the insert in table_1 to compute table_2, same for table_2 and table_3.
However I can't control when the CTE upserting into each table is executed (as far as I know, there is no guarantee of execution order when using multiple cte).
Now about the lock of tables: I guess it wouldn't work for the same reasons at the individual upsert statement.
If I use it at the whole query level, other parallel process will be waiting for the whole query to finish before being allowed to go on?
Moreover, whatever LOCK I use will be within the same big transaction, so the only solution may be :
lock table_1,table_2,table_3;
execute big querry;
unlock table_1, table_2, table_3;
I didn't find any "vanilla" lock. Is "vanilla" some kind of postgres slang?
Thanks for the links to the doc. I didn't know about SERIALIZABLE transaction, nor about mvcc .
I guess it would be equivalent to put an exeception block in the upsert function on duplicated primary key?
I don't see how to use it at querry level. The whole query being in one transaction, I would have to retry it because a few percent rows are duplicates !
Thanks all for your help,
Cheers,
Rémi-C
Cheers,
Rémi-C