Re: postgres files in use not staying in linux file cache

Jeff Janes <jeff.janes@xxxxxxxxx> · Mon, 16 Jun 2014 11:14:41 -0700

On Thu, Jun 5, 2014 at 2:32 PM, Brio <brianoraas@xxxxxxxxx> wrote:
> Hi, I'm trying to investigate a performance problem.
>
> We have a large database (over 1TB) running on a server with 160GB of RAM
> and 32 cores (Xeon E5-2650). The database files are on a NetApp mount.
>
...
>
> I have noticed a few times that an index scan may be taking a long time, and
> the query's backend process is reading from disk at about 2 MB/s, spending
> 99% of its time waiting for I/O (using iotop). This makes sense, if scanning
> an index that is not in cache.

Does the index scan dirty most of the index blocks it touches?  (When
an index scan follows an index entry to a heap page and finds that the
tuple is no longer needed, when it gets back to the index it might
kill that entry, so that the next index scan doesn't need to do the
futile heap look up.  This dirties the index block, even for a "read
only" scan.  However, It would be unusual for a typical index scan to
do this for most of the blocks it touches.  It could happen if the
index scan is to support a giant rarely run reporting query, for
example, or if your vacuuming schedule is not tuned correctly.)

The reason I ask that is that I have previously seen the dirty blocks
of NetApp-served files get dropped from the linux page cache as soon
as they are written back to the NetApp.

I had written a little Perl script to cut postgresql out of the loop
entirely to demonstrate this effect, but I no longer have access to
it.

Cheers,

Jeff