I am accumulating a transaction using libpqxx and suspect I'm doing something overly tortuous to get where I'm going.
I don't mean to make this question longer than it needs to be, but I think context will help. So I'll start by explaining briefly how I got to where I am. And then I'll show where I am, which I'm pretty sure is abusive and not how libpqxx means for me to do things.
Intro: the naive approach to doing thousands of inserts
I have some code that wants to write a bunch of stuff to the
database, typically INSERT ... ON CONFLICT UPDATE
.
Roughly, this code is looping over some container and generating
the needed SQL to insert each container object that needs
inserting.
The naive way to do this (skipping error/exception handling for the moment) is thus:
pqxx::work txn(conn);
ostringstream sql_stmt;
Container rows;
for (const auto& row : rows) {
if (row.IsDirty()) {
RowToSQL(sql_stmt, row, txn);
txn.exec(sql_stmt.str());
// Clear sql_stmt here.
}
}
txn.commit();
The function RowToSQL() takes a
transaction object so that it can quote strings appropriately
using txn.quote().
This is inefficient, however: calling exec()
over and over turns out to be quite slow.
So instead I build up a bunch of statements in the ostringstream, thus:
pqxx::work txn(conn);
ostringstream sql_stmt;
Container rows;
for (const auto& row : rows) {
if (row.IsDirty()) {
RowToSQL(sql_stmt, row, txn);
if (++counter > kExecuteThreshold || sql_stmt.IsGettingRatherBig()) {
txn.exec(sql_stmt.str());
// Clear sql_stmt, reset counter here.
}
if (count > kCommitThreshold) {
txn.commit();
// Reset statement counter here.
}
}
}
// Final commit here.
I chose the two thresholds based on performance testing in our
environment. Think 100 and 10,000 for order of magnitude.
This worked until it didn't, because re-using the transaction this
way leads to conflicts.
Attempt to activate transaction<READ COMMITTED> which is already closed.
I mostly understand all this.
This leads me to write something that has weird code smell, and I
so suspect I've misunderstood how postgresql / libpqxx intend to
be used.
Adding back error/exception handling, I've now written this based
on lots of doc reading:
Less naive but weird code smell
// Has weird code smell.
pqxx::nontransaction non_txn;
ostringstream sql_stmt;
Container rows;
Vector<string> transactions;
for (const auto& row : rows) {
if (row.IsDirty()) {
RowToSQL(sql_stmt, row, non_txn);
if (++counter > kExecuteThreshold || sql_stmt.IsGettingRatherBig()) {
transactions.push_back(sql_stmt.str());
// Clear sql_stmt, reset counter here.
}
if (count > kCommitThreshold) {
try {
pqxx::work txn(conn);
for (const string& trans : transactions) {
txn.exec(trans);
}
txn.commit();
transactions.clear();
// Reset statement counter here.
} catch (const exception& e) {
txn.abort(); // Pedantic, happens on destruction.
YellowAlert(); // Something appropriate.
}
}
}
}
// Final commit here.
It seems quite wrong to me that I should build up this vector of
things to execute and maintain these custom-tuned
execution/transaction thresholds rather than using some facility
of libpqxx. That is, this pattern seems common enough to me that
the complexity I'm starting to see strikes me as my own
misunderstanding.
Any pointers much appreciated.
-- Jeff Abrahamson +33 6 24 40 01 57 +44 7920 594 255 http://p27.eu/jeff/ http://mobilitains.fr/