Re: Understanding updates and deadlocks more clearly

Ron <ronljohnsonjr@xxxxxxxxx> · Fri, 2 Sep 2022 19:28:17 -0500



    On 9/2/22 16:15, Wells Oliver wrote:

    
      Hey folks: I want to clarify a few things about
        locks on updates:
        

        1- can two processes update the same table if they are
          updating different rows?
      
    
    You can test this yourself by trying it,

    
        2- in the event of two processes trying to update the same
          row, does this always result in a deadlock?
      
    
    If txn 1 starts, updates TABLE_A then TABLE_B and commits, while "at
    the same time" txn 2 (say it's the same code but a different end
    user processing different data) does the same thing to the same or
    similar rows, you might have a lock conflict
    (depending on RDBMS, transaction type, type of indices, length of
    transaction, how well it's written, etc).

    
    OTOH, you might get a deadlock if If txn 1 starts, updates
    TABLE_A then TABLE_B and commits, while "at the same time" txn 2
    updates TABLE_B then TABLE_A.

    
    In that case, txn 1 locks a chunk of TABLE_A, while txn 2 locks a
    chunk of TABLE_B, and then txn 1 tries to lock TABLE_B
    but can't, because txn 2 has already locked it.  "Simultaneously",
    txn 2 tries to lock TABLE_A but can't, because txn 1 has
    already locked it.

    
    That's a simplistic form of deadlock, which for 25 years competent
    DBMSs have been able to avoid.  (Even so, you can sometimes force
    the issue by using SERIALIZABLE transaction mode.)

    
        3- is there a way to create a system whereby potentially
          concurrent processes can update the same row in a given table,
          can the server be made to queue those updates, etc?
      
    
    This is what DBMSs have been doing for 50+ years.  Of course
    a single CPU system can by definition only do one thing at a time;
    while multi-CPU systems can execute multiple CPU op-codes at once,
    at some point you'll get to a serialized bottleneck.  For example,
    the disk controller can only write one "thing" at at time to
    a disk block.

    
    Thus, at some deep stage, everything is serialized.  Having said
    that, there are many layers between your application sending SQL
    code to Postgresql, and there hints that you can provide the RDBMS,
    and also there are hints that the RDBMS can figure out itself.

    
    PostgreSQL uses MVCC (multi-version concurrency control, which keeps
    multiple versions of a record on disk; each transaction can only see
    one version, and at some point some of the record versions are
    marked "dead", and they just sit there wasting space and IO (since
    disks read blocks, not Postgresql records).  This is why VACUUM is
    so important in UPDATE/DELETE heavy tables.

    
    Transaction length also plays into concurrency (or lack thereof).  A
    txn that starts, does some quick DB changes (having done as much
    work as possible beforehand) and then commits allows for faster
    concurrency than one that starts, does some work, does a bunch of
    unrelated tasks, inefficiently modifies the database, and then
    commits.  Or worse, starts a transaction, does some work,
    and then waits to commit until the user (who just went to
    lunch!!! presses a key.

    
    -- 

      Angular momentum makes the world go 'round.