I tried the patch and it has no effect whatsoever -- even with the patch, under the correct load the corrupted entries are coming fast and furious (I found a load profile on my app that reproduces these very clearly). Here are a few other observations for what they are worth: The problem seems very easy to reproduce on my production-like environment: 16GB memory, 4 CPUs, RedHat, only DB running on that machine, the DB is accessed by 4 appservers, running on 2 other machines, each of the 4 appservers configured with up to 20 connections in the pool, incoming connections load balanced among the appservers. Conversely, the problem is very hard (but not impossible) to reproduce on a "lesser" environment: 4GB mem, 2 CPUs, Fedora Core, DB and 1 appserver running on same machine (and competing for resources), appserver still configured for up to 20 connections. The problem only happens when I put a bit of a load on the application, not necessarily a lot of connections, but a steady amount of requests per second -- a few simulated users hammering on it without pauses results in at least one corrupted line every couple of seconds. So it seems to me related to how many processes are writing at the same time uninterrupted. Anything else I can do to diagnose? > -----Original Message----- > From: pgsql-general-owner@xxxxxxxxxxxxxx > [mailto:pgsql-general-owner@xxxxxxxxxxxxxx] On Behalf Of George Pavlov > Sent: Saturday, June 02, 2007 11:33 AM > To: Tom Lane > Cc: Ed L.; pgsql-general@xxxxxxxxxxxxxx > Subject: Re: [GENERAL] query log corrupted-looking entries > > From: Tom Lane [mailto:tgl@xxxxxxxxxxxxx] > > "George Pavlov" <gpavlov@xxxxxxxxxxxxxx> writes: > > > ... Also redirect_stderr = on. > > > > Hm. Well, that's the bit that ought to get you into the PIPE_BUF > > exception. There's been some speculation that a change like the > > attached would help. I've found that it makes no difference with > > my libc, but maybe yours is different --- want to try it? > > I will. I may need some time though, since I first need to > find a way to > reproduce the problem reliably on my test environments and right now I > cannot seem to. I have seen the problem mostly under production loads > (also under certain kinds of stress-testing, but I forget > exactly which > kind...) > > In the meantime I went and looked at the logs in more detail and the > corrupted entries seem much more prevalent than what I originally > thought. Apart from the ones pgfouine complains about there are many > more such lines. For example out of a (average-load) day's > log file with > 17+ million lines pgfouine complains about 8 lines, but there are in > fact 1400+ lines with these kinds of entries. > > George