(Sorry, our last exchange forgot to cc the pgsql-performance list.)
Yes, I did see the original problem only when postgres was also accessing the file. But the issue is intermittent, so I can't reproduce on demand, so I'm only reporting what I saw a small number of times, and not necessarily (or likely) the whole story.This machine has 64 GB of RAM. There was about 20 GB free, and the rest was mostly file cache, mostly our large 1TB database. I ran a script that did various reading and writing to the database, but mostly updated many rows over and over again to new updated values. As this script ran, the cached memory slowly dropped, and free memory increased. I now have 43 GB free! I'd expect practically any activity to leave files in the cache, and no significant evictions to occur until memory runs low. What actually happens is the cache increases gradually, and then drops down in chunks. I would think that the only file activity that would evict from cache would be deleting files, which would only happen when dropping tables (not happening in my test script), and also WAL file cycling, which should stay a constant amount of memory.
But, if blocks that are written are evicted from the cache, that would explain it, so I'd like to test that. As a very basic test, I tried:
cd /path-to-nfs-mount
echo "foo" > foo.txtOn Mon, Jun 23, 2014 at 3:56 PM, Jeff Janes <jeff.janes@xxxxxxxxx> wrote:
On Wed, Jun 18, 2014 at 11:18 PM, Brio <brianoraas@xxxxxxxxx> wrote:I thought that you saw the same problem with cat only when it was
> Hi Jeff,
>
> That is interesting -- I hadn't thought about how a read-only index scan
> might actually write the index.
>
> But, to avoid effects like that, that's why I dropped down to simply using
> "cat" on the file, and I saw the same problem there, with no writing back.
running concurrently with the index scan, and when the index scan
stopped the problem in cat went away.
I don't know why it would do that, it never made much sense to me.
> So the problem really seemed to be in Linux, not Postgres.
>
> But why would dirty blocks of NetApp-served files get dropped from the Linux
> page cache as soon as they are written back to the NetApp? Is it a bug in
> the NetApp driver? Isn't the driver just NFS?
But that is what the experimental evidence indicated.
What I was using was NetApp on the back-end and just the plain linux
NFS driver on the client end, and I assume the problem was on the
client end. (Maybe you can get a custom client driver from Net-App
designed to work specifically with their server, but if so, I didn't
do that. For that matter, maybe just the default linux NFS driver has
improved.)
Yes, it was a serious issue for one intended use. But it is was
> That sounds like a serious
> issue. Is there any online documentation of bugs like that with NetApp?
partially mitigated by the fact that I would probably never run an
important production database over NFS anyway, out of corruption
concerns. I was hoping to use it just for testing purposes, but this
limit made it rather useless for that as well. I don't think it would
be a NetApp specific issue and didn't approach it from that angle,
just that NetApp didn't save from the issue.
Cheers,
Jeff