Thanks to everyone for all the detailed responses so far.
The dataset we're working with, in its original form, is roughly a terabyte; I would guess in the database it will come out to about 2TB-ish. It will definitely not fit in RAM, unfortunately, and our access patterns will be quite random, at least at first (maybe as the analysis continues it will become clear which records are irrelevant, but not yet).
I would love to take advantage of parallelism, but because this is on an HPC cluster I don't have any personal control over the database version (9.6.6) or anything involving OS configuration. I will take a look at the current values of those configuration variables however; maybe we can get at least some speedup.
It seems I can be confident that shared_buffers and work_mem, along with effective_io_concurrency and effective_cache_size, ought to be much larger than their default values. How much larger shared_buffers should be depends on whether it is better to let Linux or Postgres handle the cache. I will try to get those changes made and then start benchmarking some simple queries.
On Tue, Jun 18, 2019 at 12:48 PM Andres Freund <andres@xxxxxxxxxxx> wrote:
Hi,
On 2019-06-18 12:33:30 -0400, Alvaro Herrera wrote:
> On 2019-Jun-18, Andres Freund wrote:
>
> > On 2019-06-17 19:45:41 -0400, Jeff Janes wrote:
> > > If not, I would set the value small (say, 8GB) and let the OS do the
> > > heavy lifting of deciding what to keep in cache.
> >
> > FWIW, in my opinion this is not a good idea in most cases. E.g. linux's
> > pagecache doesn't scale particularly gracefully to large amounts of
> > data, and it's decisions when to evict data aren't really better than
> > postgres'. And there's a significant potential for additional
> > unnecessary disk writes (because the kernel will flush dirty pagecache
> > buffers, and then we'll just re-issue many of those writes again).
>
> One additional tuning point in this area is that you need to tweak the
> Linux flusher so that it starts writing earlier than what it does by
> default (by setting dirty_background_bytes to some reasonable value --
> as I recall it normally runs on the dirty_background_ratio setting,
> which means it scales up when you add RAM, which I'm not sure is really
> sensible since you really care about the disk I/O write rate rather than
> anything about the memory). If you don't do that, the kernel can
> accumulate lots of memory to write before starting to write it, which is
> troublesome once it does.
I think that's less needed these days - by default postgres triggers the
backend to flush data when writes occur from bgwriter
(bgwriter_flush_after) or checkpointer (checkpointer_flush_after). And
IMO one should turn on the flushing by backends in most cases too
(e.g. backend_flush_after=2MB), unless it's a really latency/jitter
insensitive application, or storage is *REALLY* slow.
There's a few things we don't flush that we maybe should (file extension
writes, SLRUs), so it can still be sensible to tune
dirty_background_bytes. But that has the disadvantage of also affecting
temp file writes etc, which is usually not wanted.
Greetings,
Andres Freund
--
Michael J. Curry