Re: How can I see if my code is "concurrency safe"?

"David Johnston" <polobo@xxxxxxxxx> · Wed, 25 Apr 2012 21:05:50 -0400

> -----Original Message-----
> From: pgsql-general-owner@xxxxxxxxxxxxxx [mailto:pgsql-general-
> owner@xxxxxxxxxxxxxx] On Behalf Of Ben Chobot
> Sent: Wednesday, April 25, 2012 7:29 PM
> To: Janne H
> Cc: pgsql-general@xxxxxxxxxxxxxx
> Subject: Re:  How can I see if my code is "concurrency safe"?
> 
> On Apr 25, 2012, at 5:17 PM, Janne H wrote:
> 
> > Hi there!
> >
> > Today I realised that my knowledge concerning how postgres handles
> concurrency is not very good, and its even worse when it comes to using
that
> knowledge in real-life.
> >
> > Let me give you an example.
> > I have this table
> >
> > create table userpositions ( userID int,  positionID int, unique
> > (userID,positionID));
> >
> > For a given userID there can be many positionIDs.
> >
> > There are then two operations performed on this table, the first is
> > "select positionID from userpositions where userID=..." to get all
> > positions for a user, and the second is to replace all positions for
> > the user with new positions. For this I was thinking about running it
> > in a transaction
> >
> > begin;
> >   delete from userpositions where userID=...;
> >   insert into userpositions (userID,positionID) values ....; commit;
> >
> > But will this work? I don't want select to return empty results, I don't
want
> two inserts running causing a unique violation.
> > Experimenting with it tells me yes, it will work, but how should I
reason to
> "convinse" my self that it will work?
> >
> > Quoting the docs:  "The partial transaction isolation provided by Read
> Committed mode is adequate for many applications, and this mode is fast
> and simple to use; however, it is not sufficient for all cases.
Applications that
> do complex queries and updates might require a more rigorously consistent
> view of the database than Read Committed mode provides."
> >
> >
> > How do I know I'm not creating one of those complex queries? Or to put
it
> differntly, how do you identify a complex query with potential issues? Any
> special techniques to analyze?
> 
> Think about it this way: once one session starts a transaction, any
> modifications it makes are invisible to other sessions until you commit
your
> transaction, at which point they all become visible atomically. (Unless
those
> other sessions have explicitly said, "yes, I want to play with fire" and
changed
> their isolation mode to something other than Read Committed.)
> 
> So given what you've said, it seems like you should be set, so long as you
> don't have two different sessions try to insert the same
{userID,positionID}
> tuple. If that happens, then the second one that tries to commit will
fail. You
> can avoid that by using a sequence for positionID, which will result in
gaps
> and non-sequential IDs, but also no uniqueness failures.
> --
> Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make
> changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

=====================================================

Please keep in mind that simply adding in a "SERIAL" (sequence) value to
avoid duplicate keys is dangerous as well.  While it avoids a physical
duplicate you are now able to insert a logical duplicate into the system.  

Specific to your code could you just perform the following?

UPDATE userpositions SET userid = new_userid WHERE userid = old_userid;

In response to Ben: it appears that positionID is a FK in this situation
which means whether the corresponding PK is serial or not is irrelevant.
To Janne: If this indeed is a foreign key situation (multi-multi reference
table) you should define both columns using "FOREIGN KEY ... REFERENCES ..."

Since you are inserting records for a different userID than the one you are
deleting you currently have no checking in place to ensure that one of your
INSERT values is not already on the table.  You also have not defined the
relationship between position and user (i.e., can more than one person hold
a given position).  Assuming each position can only have one person active
at a time you should define the positionID field as UNIQUE by itself as
well.  Then, if you ensure you only add records that correspond to the
deleted records shown you can ensure that no duplicates will exist since
someone trying to add a duplicate position before you commit will have to
wait on the DELETE to unlock the position and by the time that happens you
will already have insert a new record for that position.  

RISKY: Your risk, as shown, is between obtaining your "values" and running
the transaction delete/insert your "values" become invalid.  Say after you
select but before you delete/insert someone adds a new position to the user.
Your delete will remove the newly added position but your subsequent insert
will not have the new data and thus that position will end up un-filled
after your routine.  Same thing applies if they delete a position entry -
you will end up filling it again.  It is also possible that the position
could be double-filled if the userID is changed during your window.  Since
you have not provided ALL of your code (especially the SELECT/VALUES
portion) I would assume that you are obtaining the data nievely and so you
do have a concurrency issue to address.

Dave

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general