a simple-minded question about updating

Martin Mueller <martinmueller@xxxxxxxxxxxxxxxx> · Fri, 19 May 2023 04:08:58 +0000

I work with Postgres and wonder whether for my purposes there is a good-enough reason to update one of these days.

I’m an editor working with some 60,000 Early Modern texts, many of them in need of some editorial attention. The texts are XM encoded documents. Each word is wrapped in a <w> element with attributes for various
 linguistic metadata. Typically a type of error occurs several or many times, and at the margins they need individual attention. I use Python scripts to extract stuff from the main corpus—sometimes dozens, sometimes thousands or millions—turn them into keyword
 in contexts and import them into Postgres. I basically use Postgres as a giant spreadsheet.  Its excellent string-handling routines make it relatively easy to to perform search and sort operations that identify tokens in need of correction. Once they corrections
 are made in Postgres—typically as batch updates-- I move them as a data frame into Python, and from Python I move them back into the texts.

I do this on a recent Mac with 64 GB of memory and a 6 cor i& processor.  I use Data Studio as an editing interface.

Unless a more recent version of Postgress has additional string handling routines, or indexing routines that speed up working with tables with rows in the low millions, or other features that are likely to
 speed up operations, I don’t see any reasons to update.

I could imagine a table that has up to 40 million rows.  That would be pretty sluggish on my current equipment, which handles up to 10 million rows quite comfortably.

A I right in thinking that given my tasks and equipment it would be a waste of time to update? Or is there something I’m missing?

Martin Mueller
Professor emeritus of English and Classiccs
Northwestern University