Re: Advisory lock deadlock issue

Merlin Moncure <mmoncure@xxxxxxxxx> · Wed, 7 Jun 2017 09:32:36 -0500

On Wed, Jun 7, 2017 at 9:16 AM, David Rosenstrauch <darose@xxxxxxxxxx> wrote:
> I'm running a Spark job that is writing to a postgres db (v9.6), using the
> JDBC driver (v42.0.0), and running into a puzzling error:
>
> 2017-06-06 16:05:17.718 UTC [36661] dmx@dmx ERROR:  deadlock detected
> 2017-06-06 16:05:17.718 UTC [36661] dmx@dmx DETAIL:  Process 36661 waits for
> ExclusiveLock on advisory lock [16649,0,102440,5]; blocked by process 36662.
>         Process 36662 waits for ExclusiveLock on advisory lock
> [16649,0,102439,5]; blocked by process 36661.
>
> However, I can't for the life of me figure out a) how the advisory locks are
> getting created (as I'm not calling for them myself), and b) how to fix this
> issue.
>
>
> A bit of background:  My Spark job runs as multiple processes on multiple
> machines.  Each process is performing the writes to pgsql using the jdbc
> driver.  The writes are performed a) as PG UPSERTS, b) as JDBC batches, and
> c) using JDBC prepared statements.  So each process, when it's time to write
> to the db, creates several prepared statements, adds a bunch of UPSERTs to
> each prepared statement (i.e., each prepared statement contains a batch of a
> few hundred UPSERTs), and then performs an executeBatch() on each statement
> to perform the write.  That executeBatch() call is where I'm running into
> the error.
>
> In theory, since there's multiple processes that are issuing these batched
> DB writes, there could be a record locking problem if, say, 2 processes
> tried to perform updates to the same user record.  But in reality this
> should be impossible.  Spark partitions everything based on a key - in my
> case userID - so all DB writes for the same user should be happening in the
> same process.  So at worst I could just have a batch that contains multiple
> UPSERTs to the same user record, but I should never be seeing updates to the
> same user from different processes.
>
>
> So, I'm very puzzled by that deadlock error.  Specifically:
>
> * How could it be possible that there are 2 PG processes trying to acquire
> the same lock?  Spark's partitioning should ensure that all updates to the
> same user record get routed to the same process, so this situation shouldn't
> even be possible.

That's really a question for the Spark team.  Obviously they are --
advisory locks lay on top of the basic locking mechanics and are very
well tested and proven.   What I can tell you is that in the core
functions provided by postgres there are no advisory locks thrown --
you own the locking space (that is, code under your control).

> * How/why am I winding up acquiring advisory locks in the first place? I'm
> never requesting them.  I looked at the PG JDBC driver code a bit, thinking
> that it might automatically be creating them for some reason, but that
> doesn't seem to be the case.  Maybe the PG database itself is? (E.g., Does
> the PG automatically use advisory locks with UPSERTs?)

Some code under your control is. This could be an external module,
application code, or an sproc.

> And, last but not least:
>
> * How do I resolve this "waits for ExclusiveLock on advisory lock" issue?
> There's precious little info available regarding exactly what that error
> message is and how to solve.

Barring some reponse from Spark team, here is how I would narrow the
problem down:

*) lets search the contents of pg_proc for functions calling advisory locks:
SELECT * FROM pg_proc where prosrc ~* 'advisory';

that might turn up some 3rd party code hits

*) turn on statement level logging and in bash:
tail -f postgres_xx.log | grep -i advisory

*) repeatedly query pg_locks for locktype = 'advisory'
SELECT * FROM pg_locks where locktype = 'advisory'

also,
SELECT * FROM pg_locks WHERE locktype = 'advisory' AND NOT granted;

Advisory locks are a very blunt instrument and it's a significant risk
that two different locking systems are stepping on each other's toes.
I do not recommend using them (especially non-xact variant) unless you
have total control over all the code potentially throwing locks and
have a good understanding of interactions with connection poolers
between locking code and the database.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general