Search Postgresql Archives

Re: PG_DUMP very slow because of STDOUT ??

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Craig, thanks for that PS tip (you think, you have used PS for such a long time, but it still has some new tricks available).

And here is the more readable line:

26390 congestion_wait                          D ?        00:00:26 postgres: postgres musicload_cache [local] COPY

So the kernel function it is always idling on seems to be congestion_wait ... I am already looking around at google and in the kernel source (for 2.6.32.2)
http://lxr.linux.no/#linux+v2.6.32.2/mm/backing-dev.c#L762
/**
* congestion_wait - wait for a backing_dev to become uncongested
* @sync: SYNC or ASYNC IO
* @timeout: timeout in jiffies
*
* Waits for up to @timeout jiffies for a backing_dev (any backing_dev) to exit
* write congestion.  If no backing_devs are congested then just wait for the
* next write to be completed.
*/
So, obviously, for some reason we are waiting too much for a backind_device ... which ever it is at the moment. Because, as I just wrote to Scott Marlowe, the disk system is almost idling (have seen disk utilization on the drive to which I write below 1%). 

So, the question seems to be, why and where this "idling happens".

Just as a test, I have tried a very simple piping example (which should go trough STDOUT too ... or am I wrong). 
- "dd if=/dev/zero of=file_10GB bs=1024 count=10M" created a  10 GB test file on source drive (sdb)
- "time cat file_10GB > /var/tmp/test2.dump" ... pipe the file to target drive (/var/tmp is on sda)
This take only a bit over 2 minutes to complete ... and although, one sees congestion_wait in PS from time to time, it is not dominant, and there are other EXT4 or IO related function shown. ... But maybe this mini-experiment is lacking some important steps, that are in the path of a COPY-to-STDOUT from PostgreSQL.

Andras Fabian



-----Ursprüngliche Nachricht-----
Von: Craig Ringer [mailto:craig@xxxxxxxxxxxxxxxxxxxxx] 
Gesendet: Dienstag, 13. Juli 2010 05:03
An: Andras Fabian
Cc: Tom Lane; pgsql-general@xxxxxxxxxxxxxx
Betreff: Re:  PG_DUMP very slow because of STDOUT ??

On 12/07/10 21:03, Andras Fabian wrote:
> This STDOU issue gets even weirder. Now I have set up our two new servers (identical hw/sw) as I would have needed to do so anyways. After having PG running, I also set up the same test scenario as I have it on our problematic servers, and started the COPY-to-STDOUT experiment. And you know what? Both new servers are performing well. No hanging, and the 3 GByte test dump was written in around 3 minutes (as expected). To make things even more complicated ... I went back to our production servers. Now, the first one - which I froze up with oprofile this morning and needed a REBOOT - is performing well too! It needed 3 minutes for the test case ... WTF? BUT, the second production server, which did not have a reboot, is still behaving badly.
> Now I tried to dig deeper (without killing a production server again) ... and came to comparing the outputs of PS (with '-fax' parameter then, '-axl'). Now I have found something interesting:
> - all fast servers show the COPY process as being in the state Rs ("runnable (on run queue)")
> - on the still slow server, this process is in 9 out of 10 samples in Ds ("uninterruptible sleep (usually IO)") 
> 
> Now, this "Ds" state seems to be something unhealthy - especially if it is there almost all the time - as far as my first reeds on google show (and although it points to IO, there is seemingly only very little IO, and IO-wait is minimal too). I have also done "-axl" with PS, which brings the following line for our process:
> F   UID   PID  PPID PRI  NI    VSZ   RSS WCHAN  STAT TTY        TIME COMMAND
> 1  5551  2819  4201  20   0 5941068 201192 conges Ds ?          2:05 postgres: postgres musicload_cache [local] COPY"


Your wchan column isn't wide enough to show the full function name, but
I'd say it's related to some form of throttling or congestion control.

Get a wider view of that column to find out what the full function name
is. Grepping the kernel source for it can then tell you a lot about
where in the kernel it is and what might be going on.

Try:

    ps ax -O wchan:40

to get a decently wide view of that col.

-- 
Craig Ringer

Tech-related writing: http://soapyfrogs.blogspot.com/

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux