Re: pg_xlog size growing untill it fills the partition

Michal TOMA <mt@xxxxxxxxxx> · Tue, 8 Oct 2013 13:36:46 +0200

On Monday 07 October 2013 21:33:14 Jeff Janes wrote:
> On Mon, Oct 7, 2013 at 11:44 AM, Michal TOMA <mt@xxxxxxxxxx> wrote:
> > I gave it in my first post. It is a software raid 1 of average 7200 rpm
> > disks
> > (Hitachi HDS723020BLE640) for the main tablespace and a software raid 1
> > of SSDs for onother tablespace and alos the partition holding the pg_xlog
> > directory.
>
> So that is exactly 2 drives on the HDD side?  Yeah, that isn't going to go
> very far.
>
> > The problem is not the workload as the application is a web crawler. So
> > the workload can be infinite. What I would expect Postgres to do is to
> > regulate te workload somehow insetad of just crashing twice a day with a
> > "partition full" followed by automatic recovery.
>
> There has been some discussion about mechanisms to throttle throughput
> based on the log file partition filling up, but it was more in the context
> of archiving going down rather than checkpointing being way too slow.  No
> real conclusion was reached though.
>
> And I'm not very hopeful about it, especially not as something that would
> be on by default.  I'd be pretty ticked if the system started automatically
> throttling a bulk load because it extrapolated and decided that some
> problem might occur at some point in the future--even though I know that
> the bulk load will be finished before that point is reached.
I don't really see it as an extrapolation about some problem that may occur in 
the future. My problem is sure very specific but it is not a decision about a 
problem that may occur sometimes in the future in my case it is a decicion to 
throttle the workload now or crash with a partition full immediately.
I would really appreciate to have some kind of mechanism that would enable me 
to tell Postgres you have 20GB xlog max otherwise you'll fill the partition 
and crash.

>
> It seems like the best place to implement the throttling would be in your
> application, as that is where the sleeping can be done with the least
> amount of locks/resources being held.  Maybe you could check `fgrep Dirty
> /proc/meminfo` and throttle based on that value.
>
> Also, the nasty slug of dirty pages is accumulating in the OS, not in
> PostgreSQL itself, so you could turn down dirty_ratio and friends in the
> kernel to limit the problem.

I can of course implement this on the application size quite easily in 
different ways but the problem seems to me how do I guess what the real 
database limits are.
In my case the server was running fine for a year with my initial settings. 
Now it is crashing twice a day very likely because of one particular table or 
index becoming too big to fit in the memory and suddenly requiring much more 
disk activity.
This seems to me like a pretty hard thing to guess on the application side.

Quite surprisingly as advised in one of the replies to my problem I have set 
the shared_buffers to a very low value (128MB instead of my initial 16GB) and 
this seems to allow me to control the growth of the xlog.

I don't really understand why. Any explanation on this phenomenon would be 
greatly appreciated.

Michal

>
> Cheers,
>
> Jeff

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general