Re: Simple select hangs while CPU close to 100% - Analyze

"Jozsef Szalay" <jszalay@xxxxxxxxxxxx> · Fri, 17 Aug 2007 17:54:29 -0500

With the limited time I had, I could not produce a test case that I
could have submitted to this forum. 

I have found another issue in production though. In addition to the 

SELECT COUNT(*) FROM table; 

taking forever (>6 hours on a table with 100,000 rows and with no
indexes on a system with Linux Fedora 3 and with two 3.2GHz Xeon
processors plus hyperthreading), the 

"SELECT column_1 FROM table GROUP BY column_1 HAVING COUNT(*) > 1;

statement actually ran out of memory (on a table with 300,000 rows and
with no indexes, while the OS reported >3.5 GB virtual memory for the
postgres backend process).

To make it short, we found that both problems could be solved with a
single magic bullet, namely by calling ANALYZE every time after large
amount of changes were introduced to the table. (Earlier, we called
ANALYZE only after we did some serious post-processing on freshly
bulk-loaded data.)

I don't know why ANALYZE would have any effect on a sequential scan of a
table but it does appear to impact both performance and memory usage
significantly.

Both of our production issues have vanished after this simple change! We
do not have to call a FULL VACUUM on the table anymore. The plain VACUUM
is satisfactory.

Thanks for all the responses!
Jozsef 

-----Original Message-----
From: Bill Moran [mailto:wmoran@xxxxxxxxxxxxxxxxxxxxxxx] 
Sent: Wednesday, July 25, 2007 3:29 PM
To: Jozsef Szalay
Cc: Pavel Stehule; pgsql-performance@xxxxxxxxxxxxxx
Subject: Re:  Simple select hangs while CPU close to 100%

In response to "Jozsef Szalay" <jszalay@xxxxxxxxxxxx>:

> Our application is such that any update to the database is done by a
> single session in a batch process using bulk load. The frequency of
> these usually larger scale updates is variable but an update runs
every
> 2-3 days on average.
> 
> Originally a plain VACUUM ANALYZE was executed on every affected table
> after every load.

Any other insert/update activity outside of the bulk loads?  What's
the vacuum policy outside the bulk loads?  You say originally, does
it still do so?

I agree with Pavel that the output of vacuum verbose when the problem
is occurring would be helpful.

> VACUUM FULL ANALYZE is scheduled to run on a weekly basis.

If you need to do this, then other settings are incorrect.

> I do understand the need for vacuuming. Nevertheless I expect Postgres
> to return data eventually even if I do not vacuum. In my case, the
> simple SELECT COUNT(*) FROM table; statement on a table that had
around
> 100K "live" rows has not returned the result for more than 6 hours
after
> which I manually killed it.

It should, 6 hours is too long for that process, unless you're running
a 486dx2.  You didn't mention your hardware or your postgresql.conf
settings.  What other activity is occurring during this long count()?
Can you give us a shot of the iostat output and/or top during this
phenomenon?

>  
> Jozsef
> 
> 
> -----Original Message-----
> From: Bill Moran [mailto:wmoran@xxxxxxxxxxxxxxxxxxxxxxx] 
> Sent: Wednesday, July 25, 2007 1:12 PM
> To: Jozsef Szalay
> Cc: Pavel Stehule; pgsql-performance@xxxxxxxxxxxxxx
> Subject: Re:  Simple select hangs while CPU close to 100%
> 
> In response to "Jozsef Szalay" <jszalay@xxxxxxxxxxxx>:
> 
> > Hi Pavel,
> > 
> > 
> > Yes I did vacuum. In fact the only way to "fix" this problem is
> > executing a "full" vacuum. The plain vacuum did not help.
> 
> I read over my previous reply and picked up on something else ...
> 
> What is your vacuum _policy_?  i.e. how often do you vacuum/analyze?
> The fact that you had to do a vacuum full to get things back under
> control tends to suggest that your current vacuum schedule is not
> aggressive enough.
> 
> An explicit vacuum of this table after the large delete/insert may
> be helpful.
> 
> > -----Original Message-----
> > From: Pavel Stehule [mailto:pavel.stehule@xxxxxxxxx] 
> > Sent: Sunday, July 22, 2007 10:53 AM
> > To: Jozsef Szalay
> > Cc: pgsql-performance@xxxxxxxxxxxxxx
> > Subject: Re:  Simple select hangs while CPU close to 100%
> > 
> > Hello
> > 
> > did you vacuum?
> > 
> > It's good technique do vacuum table after remove bigger number of
> rows.
> > 
> > Regards
> > Pavel Stehule
> > 
> > 2007/7/22, Jozsef Szalay <jszalay@xxxxxxxxxxxx>:
> > >
> > >
> > >
> > >
> > > I'm having this very disturbing problem. I got a table with about
> > 100,000
> > > rows in it. Our software deletes the majority of these rows and
then
> > bulk
> > > loads another 100,000 rows into the same table. All this is
> happening
> > within
> > > a single transaction. I then perform a simple "select count(*)
from
> > ..."
> > > statement that never returns. In the mean time, the backend
Postgres
> > process
> > > is taking close to 100% of the CPU. The hang-up does not always
> happen
> > on
> > > the same statement but eventually it happens 2 out of 3 times. If
I
> > dump and
> > > then restore the schema where this table resides the problem is
gone
> > until
> > > the next time we run through the whole process of deleting,
loading
> > and
> > > querying the table.
> > >
> > >
> > >
> > > There is no other activity in the database. All requested locks
are
> > granted.
> > >
> > >
> > >
> > > Has anyone seen similar behavior?
> > >
> > >
> > >
> > > Some details:
> > >
> > >
> > >
> > > Postgres v 8.1.2
> > >
> > > Linux Fedora 3
> > >
> > >
> > >
> > > shared_buffers = 65536
> > >
> > > temp_buffers = 32768
> > >
> > > work_mem = 131072
> > >
> > > maintenance_work_mem = 131072
> > >
> > > max_stack_depth = 8192
> > >
> > > max_fsm_pages = 40000
> > >
> > > wal_buffers = 16
> > >
> > > checkpoint_segments = 16
> > >
> > >
> > >
> > >
> > >
> > > top reports
> > >
> > >
> > >
> > >   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
COMMAND
> > >
> > > 19478 postgres  25   0  740m 721m 536m R 99.7  4.4 609:41.16
> > postmaster
> > >
> > >
> > >
> > > ps -ef | grep postgres reports
> > >
> > >
> > >
> > > postgres 19478  8061 99 00:11 ?        10:13:03 postgres: user
dbase
> > [local]
> > > SELECT
> > >
> > >
> > >
> > > strace -p 19478
> > >
> > > no system calls reported
> > >
> > >
> > >
> > >
> > >
> > > Thanks for the help!
> > >
> > > Jozsef
> > 
> > 
> > ---------------------------(end of
> broadcast)---------------------------
> > TIP 4: Have you searched our list archives?
> > 
> >                http://archives.postgresql.org
> > 
> > 
> > 
> > 
> > 
> > 
> 
> 
> -- 
> Bill Moran
> Collaborative Fusion Inc.
> http://people.collaborativefusion.com/~wmoran/
> 
> wmoran@xxxxxxxxxxxxxxxxxxxxxxx
> Phone: 412-422-3463x4023

-- 
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran@xxxxxxxxxxxxxxxxxxxxxxx
Phone: 412-422-3463x4023

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org