Search Postgresql Archives

Re: High cpu usage after many inserts

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Feb 24, 2009 at 12:40 AM, Jordan Tomkinson <jordan@xxxxxxxxxx> wrote:
>
>
> On Tue, Feb 24, 2009 at 11:42 AM, Greg Smith <gsmith@xxxxxxxxxxxxx> wrote:
>>
>> Right, the useful thing to do in this case is to take a look at how big
>> all the relations (tables, indexes) involved are at each of the steps in the
>> process.  The script at http://wiki.postgresql.org/wiki/Disk_Usage will show
>> you that.  That will give some feedback on whether the vacuum/reindex
>> methodology is really doing what you expect, and it will also let you
>> compare the size of the table/index with how much RAM is in the system.
>>
>
> taken before the ~7000 rows were entered.
>
>            relation            |  size
> -------------------------------+--------
>  public.mdl_log                | 595 MB
>  public.mdl_forum_posts        | 375 MB
>  public.mdl_log_coumodact_ix   | 197 MB
>  public.mdl_user               | 191 MB
>  public.mdl_cache_text         | 162 MB
>  public.mdl_log_usecou_ix      | 137 MB
>  public.mdl_log_act_ix         | 119 MB
>  public.mdl_log_cmi_ix         | 97 MB
>  public.mdl_log_tim_ix         | 97 MB
>  public.mdl_log_id_pk          | 97 MB
>  public.mdl_question_states    | 48 MB
>  public.mdl_stats_user_daily   | 48 MB
>  public.mdl_hotpot_responses   | 47 MB
>  public.mdl_register_downloads | 45 MB
>  public.mdl_message_read       | 37 MB
>  public.mdl_course_display     | 37 MB
>  public.mdl_stats_user_weekly  | 31 MB
>  public.mdl_mnet_log           | 27 MB
>  public.mdl_user_ema_ix        | 26 MB
>  public.mdl_regidown_url_ix    | 23 MB

What's more interesting is how quickly they grow during your test.
I'm betting that as public.mdl_log and  public.mdl_forum_posts grow,
you get a dataset larger than memory.

There are two levels of caching that pgsql uses, the highest and
closest to pgsql is the shared_buffer cache, and the next is the
kernel level file system cache.    While it's still way faster to hit
the kernel level of file cache than to hit the actual hard drives, the
pg shared_buffers is the fastest.  You may be in a situation where
giving a bit more memory to pg will help, but with a 4G dataset and 8G
of ram you're cutting it close.  You need a few gig for sorts and
processes and such like that.  Going to 16Gig you could set
shared_buffers at somewhere in the 4 to 8Gig range and it might work
out.

If you're looking at scaling to large amounts of data, you can't plan
on it all fitting into memory, and you have to start planning for
faster Disk I/O.  This means more disks, fast RAID controllers with
optional battery backed cache (not really optional) and / or kernel
level RAID, for read mostly stuff it's quite fast.  As expensive as 16
or 24 or 32 fast hard drives are, they're cheaper than servers with a
half terabyte of ram or whatever you'd need for a big dataset.

First things first I'd try increasing shared_buffers to the just over
4G range.  I'd check after each run with vacuum verbose (NOT FULL) to
see how bloated my db was getting.

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux