Re: Maintaining an index on a large table: Is there any potential for it to stall my application?

Craig Ringer <craig@xxxxxxxxxxxxxxxxxxxxx> · Tue, 03 Jun 2008 08:19:57 +0800

Peter Geoghegan wrote:
> Hello,
> 
> I'm writing a C++ application that stores data in a table that may
> ultimately become very large (think tens of millions of rows). It has
> an index on one row, in addition to the index created on/as part of
> its primary key. My concern is that a call to the pl/pgSQL function
> that INSERTs data into this table might eventually lock the UI for an
> annoyingly long time, as control flow in my application waits for that
> Pl/PgSQL function to return.

Database latencies aren't generally neatly bounded; you're always going
to get the  odd long job. PostgreSQL isn't a realtime system after all,
and it's oriented toward throughput over latency.

This means that your app really needs to be able to be responsive while
queries are in flight. If nothing else, the user might want to cancel
something or might know something you don't (say, they just unplugged
the network cable). Unless libpq can be convinced to use asynchronous
I/O and avoid blocking on reads that probably means threading.

I've been doing lots with Pg and Java lately and I've found it very
pleasant how relatively easy it is to produce responsive code using a
background worker thread to access the DB. I have a nasty little script
that runs in the background, sending a SIGSTOP to randomly chosen
backends then a SIGCONT a few milliseconds, seconds, or sometimes
minutes later. I chuck the odd SIGTERM in to liven up the mix and test
error handling. I can strongly recommend testing this way as you go
about your normal development if you can afford it ... it's extremely
helpful for catching issues with responsiveness, error handling, or
timing issues.

I don't imagine it's much fun to take the same sort of GUI thread +
background worker approach in C++ . I have lots more experience with C++
than Java, but fortunately for me very little with significantly
multi-threaded C++ and no multi-threaded C++ database code, so I can't
offer any useful advice on how to tackle it. All the C++ threading I've
done has been via TrollTech's Qt library, anyway, which is basically
cheating.

Given the choice I'd want to take the async I/O option over threading in
C++, but I don't get the impression that libpq is really built around
that model.

I'm assuming you're using libpq, of course, not an ODBC interface or
something else.

> I'm aware that I could create a second thread to make the call to my
> database API, libpqxx, but I have reservations due to the possible
> implications for thread safety - pqxx lacks "a flexible mechanism for
> thread synchronization", so this might cause headaches.

It just means that you should entirely encapsulate all libpq access
within a module that, except for job submit/notification messaging, runs
entirely on one thread. If there are no points where other code calls
into libpq and the libpq-using worker sticks to one thread you should be
just fine.

Getting the completion notification, cancellation, error reporting, etc
etc etc right in a concurrent environment could be fun though. I'd want
to stick to a job queue as much as possible, personally.

--
Craig Ringer