pqxx accumulating a transaction

Jeff Abrahamson <jeff@xxxxxx> · Thu, 9 Dec 2021 23:47:08 +0100



    I am accumulating a transaction using libpqxx and suspect I'm
      doing something overly tortuous to get where I'm going.
    I don't mean to make this question longer than it needs to be,
      but I think context will help.  So I'll start by explaining
      briefly how I got to where I am.  And then I'll show where I am,
      which I'm pretty sure is abusive and not how libpqxx means for me
      to do things.
    Intro: the naive approach to doing thousands of inserts
    I have some code that wants to write a bunch of stuff to the
      database, typically INSERT ... ON CONFLICT UPDATE.
      Roughly, this code is looping over some container and generating
      the needed SQL to insert each container object that needs
      inserting.
    The naive way to do this (skipping
      error/exception handling for the moment) is thus:
    
      pqxx::work txn(conn);

          ostringstream sql_stmt;

          Container rows;

          for (const auto& row : rows) {

              if (row.IsDirty()) {

                  RowToSQL(sql_stmt, row, txn);

                  txn.exec(sql_stmt.str());

                  // Clear sql_stmt here.

              }

          }

          txn.commit();

      
    The function RowToSQL() takes a
      transaction object so that it can quote strings appropriately
      using txn.quote().

      
      This is inefficient, however: calling exec()
      over and over turns out to be quite slow.

      
      So instead I build up a bunch of statements in the ostringstream, thus:
    
      pqxx::work txn(conn);

          ostringstream sql_stmt;

          Container rows;

          for (const auto& row : rows) {

              if (row.IsDirty()) {

                  RowToSQL(sql_stmt, row, txn);

                  if (++counter > kExecuteThreshold ||
          sql_stmt.IsGettingRatherBig()) {

                      txn.exec(sql_stmt.str());

                      // Clear sql_stmt, reset counter here.

                  }

                  if (count > kCommitThreshold) {

                      txn.commit();

                      // Reset statement counter here.

                  }

              }

          }

          // Final commit here.

      
    I chose the two thresholds based on performance testing in our
      environment.  Think 100 and 10,000 for order of magnitude.

      
      This worked until it didn't, because re-using the transaction this
      way leads to conflicts.
    
      Attempt to activate transaction<READ
          COMMITTED> which is already closed.

      
    I mostly understand all this.
    This leads me to write something that has weird code smell, and I
      so suspect I've misunderstood how postgresql / libpqxx intend to
      be used.

      
      Adding back error/exception handling, I've now written this based
      on lots of doc reading:
    
      Less naive but weird code smell

      
      // Has weird code smell.
      pqxx::nontransaction non_txn;

          ostringstream sql_stmt;

          Container rows;

          Vector<string> transactions;

          for (const auto& row : rows) {

              if (row.IsDirty()) {

                  RowToSQL(sql_stmt, row, non_txn);

                  if (++counter > kExecuteThreshold ||
          sql_stmt.IsGettingRatherBig()) {

                      transactions.push_back(sql_stmt.str());

                      // Clear sql_stmt, reset counter here.

                  }

                  if (count > kCommitThreshold) {

                      try {

                          pqxx::work txn(conn);

                          for (const string& trans : transactions) {

                              txn.exec(trans);

                          }

                          txn.commit();

                          transactions.clear();

                          // Reset statement counter here.

                      } catch (const exception& e) {

                          txn.abort();    // Pedantic, happens on
          destruction.

                          YellowAlert();  // Something appropriate.

                      }

                  }

              }

          }

          // Final commit here.

      
    It seems quite wrong to me that I should build up this vector of
      things to execute and maintain these custom-tuned
      execution/transaction thresholds rather than using some facility
      of libpqxx. That is, this pattern seems common enough to me that
      the complexity I'm starting to see strikes me as my own
      misunderstanding.

      
      Any pointers much appreciated.

    
    -- 
Jeff Abrahamson
+33 6 24 40 01 57
+44 7920 594 255

http://p27.eu/jeff/
http://mobilitains.fr/