Tom,
Here is what I did:
I started aggregating all read information:
First I also had added group by pid (arg0,arg1, pid) and the counts
were all coming as 1
Then I just grouped by filename and location (arg0,arg1 of reads) and
the counts came back as
# cat read.d
#!/usr/sbin/dtrace -s
syscall::read:entry
/execname=="postgres"/
{
@read[fds[arg0].fi_pathname, arg1] = count();
}
# ./read.d
dtrace: script './read.d' matched 1 probe
^C
/export/home0/igen/pgdata/pg_clog/0014
-2753028293472 1
/export/home0/igen/pgdata/pg_clog/0014
-2753028277088 1
/export/home0/igen/pgdata/pg_clog/0015
-2753028244320 2
/export/home0/igen/pgdata/pg_clog/0015
-2753028268896 14
/export/home0/igen/pgdata/pg_clog/0015
-2753028260704 25
/export/home0/igen/pgdata/pg_clog/0015
-2753028252512 27
/export/home0/igen/pgdata/pg_clog/0015
-2753028277088 28
/export/home0/igen/pgdata/pg_clog/0015
-2753028293472 37
FYI I pressed ctrl-c within like less than a second
So to me this seems that multiple processes are reading the same page
from different pids. (This was with about 600 suers active.
Aparently we do have a problem that we are reading the same buffer
address again. (Same as not being cached anywhere or not finding it in
cache anywhere).
I reran lock wait script on couple of processes and did not see
CLogControlFileLock as a problem..
# ./83_lwlock_wait.d 14341
Lock Id Mode Count
WALInsertLock Exclusive 1
ProcArrayLock Exclusive 16
Lock Id Combined Time (ns)
WALInsertLock 383109
ProcArrayLock 198866236
# ./83_lwlock_wait.d 14607
Lock Id Mode Count
WALInsertLock Exclusive 2
ProcArrayLock Exclusive 15
Lock Id Combined Time (ns)
WALInsertLock 55243
ProcArrayLock 69700140
#
What will help you find out why it is reading the same page again?
-Jignesh
Jignesh K. Shah wrote:
I agree with Tom.. somehow I think increasing NUM_CLOG_BUFFERS is
just avoiding the symptom to a later value.. I promise to look more
into it before making any recommendations to increase NUM_CLOG_BUFFERs.
Because though "iGen" showed improvements in that area by increasing
num_clog_buffers , EAStress had shown no improvements.. Plus the
reason I think this is not the problem in 8.3beta1 since the Lock
Output clearly does not show CLOGControlFile as to be the issue which
I had seen in earlier case. So I dont think that increasing
NUM_CLOG_BUFFERS will change thing here.
Now I dont understand the code pretty well yet I see three hotspots
and not sure if they are related to each other
* ProcArrayLock waits - causing Waits as reported by
83_lockwait.d script
* SimpleLRUReadPage - causing read IOs as reported by
iostat/rsnoop.d
* GetSnapshotData - causing CPU utilization as reported by hotuser
But I will shut up and do more testing.
Regards,
Jignesh
Tom Lane wrote:
Josh Berkus <josh@xxxxxxxxxxxx> writes:
Actually, 32 made a significant difference as I recall ... do you
still have the figures for that, Jignesh?
I'd want to see a new set of test runs backing up any call for a change
in NUM_CLOG_BUFFERS --- we've changed enough stuff around this area that
benchmarks using code from a few months back shouldn't carry a lot of
weight.
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
http://archives.postgresql.org