Hi, I'm trying to investigate a performance problem.
We have a large database (over 1TB) running on a server with 160GB of RAM and 32 cores (Xeon E5-2650). The database files are on a NetApp mount.Some research about the Linux page cache suggests that any file can be essentially forced into cache by cat-ting to /dev/null. So I cat a file of one of these database objects, and I can see the cat process reading at about 100MB/s, so it takes 10 sec for a 1GB file. Then I check fincore again -- the file is still not cached. cat-ting the file again still takes 10 sec.
I cat several times in a row, and the file refuses to cache. Sometimes I see a bit of the file appear in the cache, but get removed a few seconds later. I also tried the fadvise program in the ftools, which didn't help. The I/O on this machine is not all that high (a few MB/s for various postgres processes), and there are a few GB free (shown in top). Most of the RAM (150GB+) is used by the page cache. No swap is in use.
I cat several times in a row, and the file refuses to cache. Sometimes I see a bit of the file appear in the cache, but get removed a few seconds later. I also tried the fadvise program in the ftools, which didn't help. The I/O on this machine is not all that high (a few MB/s for various postgres processes), and there are a few GB free (shown in top). Most of the RAM (150GB+) is used by the page cache. No swap is in use.
Eventually the query finishes (or I cancel it). Then I find that running cat on the file does leave it in cache! So, is having multiple readers confusing Linux, or is Postgres doing any madvise on the file (I'd expect no to both).
So here's where I'm stuck. How can reading a file not leave it in the Linux cache? I'd expect it to enter the inactive list (which is about 80GB), so I'd expect another 80GB would need to be read before it would be its turn to be evicted.... which should take a long time if my maximum read speed is 100MB/s.
Since I don't understand the behaviour of simply running cat and the file not being cached, I wonder if this is an issue with Linux, not Postgres. But the issue seems to happen with files in use by Postgres, so maybe there's an interaction, so I thought I'd start here.
Any ideas how I can debug this further? Thanks!
Brian
Brian