Search Postgresql Archives

Re: Limit of bgwriter_lru_maxpages of max. 1000?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 5 Oct 2009, Greg Smith wrote:

On Sun, 4 Oct 2009, Gerhard Wiesinger wrote:

On Fri, 2 Oct 2009, Scott Marlowe wrote:

I found that lowering checkpoint completion target was what helped.
Does that seem counter-intuitive to you?


I set it to 0.0 now.

If you set that to 0.0, the whole checkpoing spreading logic doesn't apply like it's supposed to. I'm not sure what the results you posted mean now. If you had it set to 0 and saw a bad spike (which is how I read your message), I'd say "yes, that's what happens when you do reduce that parameter, so don't do that". If you meant something else please clarify.

I think the problem is, that it is done on checkpoint time (whether spread or not). I should have been already be done by bgwriter.


Thanks for the dtrace example, I suggested we add those checkpoint probes in there and someone did, but I hadn't seen anybody use them for anything yet.


I think more probes (e.g. on different writing conditions like writing from bgwriter or on a checkpoint) would be interesting here.

Bug1: usage_count is IHMO not consistent

It's a bit hack-ish, but the changes made to support multiple buffer use strategies introduced by the "Make large sequential scans and VACUUMs work in a limited-size ring" commit are reasonable even if they're not as consistent as we'd like. Those changes were supported by benchmarks proving their utility, which always trump theoretical "that shouldn't work better!" claims when profiling performance.

Also, they make sense to me, but I've spent a lot of time staring at pg_buffercache output to get a feel for what shows up in there under various circumstances. That's where I'd suggest you go if this doesn't seem right to you; run some real database tests and use pg_buffercache to see what's inside the cache when you're done. What's in there and what I expected to be in there weren't always the same thing, and it's interesting to note how that changes as shared_buffers increases. I consider some time studying that a pre-requisite to analyzing performance of this code.


I have analyzed pg_buffercache (query every second, see below) in parallel to see what happens but I didn't see expected results in some ways with the usage_counts. Therefore I analyzed the code and found IHMO the problem with the usage_count and buffer reallocation. Since the code change is also new (I think it way 05/2009) it might be that you tested before ...

BTW: Is it possible to get everything in pg_class over all databases as admin?

Bug2: Double iteration of buffers
As you can seen in the calling tree below there is double iteration with buffers involved. This might be a major performance bottleneck.

Hmmm, this might be a real bug causing scans through the buffer cache to go twice as fast as intended.

That's not twice O(2*n)=O(n) that's a factor n*n (outer and inner loop iteration) which means overall is O(n^2) which is IHMO too much.

Since the part you suggest is doubled isn't very intensive or called all that often, there's no way it can be a major issue though.

It is a major issue since it is O(n^2) and not O(n). E.g. with 2GB share buffer we have 262144 blocks and 68719476736 calls which is far too much.

That's based on knowing what the code does and how much it was called, as well as some confidence that if it were really a *major* problem, it would have shown up on the extensive benchmarks done on all the code paths you're investigating.


The problem might be hidden for the following reasons:
1.) Buffers values are too low that even n^2 is low for today's machines
2.) Code is not often called in that way
3.) backend writes out pages so that the code is never executed
4.) ...

BTW: Are there some tests available how fast a buffer cache hit is and a disk cache hit is (not in the buffer cache but in the disk cache)? I'll asked, because a lot of locking is involved in the code.

I did some once but didn't find anything particularly interesting about the results. Since you seem to be on a research tear here, it would be helpful to have a script to test that out available, I wasn't able to release mine and something dtrace based would probably be better than the approach I used (I threw a bunch of gettimeofdata calls into the logs and post-processed them with a script).


Do you have an where one should set tracepoints inside and outside PostgreSQL?

BTW2: Oracle buffercache and background writer strategy is also interesting.

As a rule, we don't post links to other database implementation details here, as those can have patented design details we'd prefer not to intentionally re-implement. Much of Oracle's design here doesn't apply here anyway, as it was done in the era when all of their writes were synchronous. That required them to worry about doing a good job on some things in their background writer that we shrug off and let os writes combined with fsync handle instead.


Ok, no problem.

Ciao,
Gerhard

--
http://www.wiesinger.com/


SELECT
  CASE
    WHEN datname IS NULL THEN pg_buffercache.reldatabase::text
    ELSE datname
  END AS database,
  CASE
    WHEN spcname IS NULL THEN pg_buffercache.reltablespace::text
    ELSE spcname
  END AS tablespace,
  CASE
    WHEN relname IS NULL THEN pg_buffercache.relfilenode::text
    ELSE relname
  END AS relation,
  CASE
    WHEN relkind IS NULL THEN pg_buffercache.relfilenode::text
    ELSE relkind
  END AS relkind,
  usagecount > 0 AS usagecount_gt_0,
  isdirty,
  MIN(relblocknumber) AS min_blocknumber,
  MAX(relblocknumber) AS max_blocknumber,
  ROUND(AVG(relblocknumber),2) AS avg_blocknumber,
  ROUND(STDDEV(relblocknumber),2) AS stddev_blocknumber,
  COUNT(*) AS count
FROM
  pg_buffercache
LEFT JOIN pg_class      ON pg_buffercache.relfilenode = pg_class.oid
LEFT JOIN pg_tablespace ON pg_buffercache.reltablespace = pg_tablespace.oid
LEFT JOIN pg_database   ON pg_buffercache.reldatabase = pg_database.oid
WHERE
  isdirty = true
GROUP BY
  CASE
    WHEN datname IS NULL THEN pg_buffercache.reldatabase::text
    ELSE datname
  END,
  CASE
    WHEN spcname IS NULL THEN pg_buffercache.reltablespace::text
    ELSE spcname
  END,
  CASE
    WHEN relkind IS NULL THEN pg_buffercache.relfilenode::text
    ELSE relkind
  END,
  CASE
    WHEN relname IS NULL THEN pg_buffercache.relfilenode::text
    ELSE relname
  END,
  usagecount > 0,
  isdirty
--HAVING relkind = 'r'
ORDER BY
  database,
  tablespace,
  relkind DESC,
  relation,
  usagecount > 0,
  isdirty
;

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux