Re: [HACKERS] 8.3beta1 testing on Solaris

"Jignesh K. Shah" <J.K.Shah@xxxxxxx> · Fri, 26 Oct 2007 11:45:26 -0400

Tom,

Here is what I did:

I started aggregating all read information:

First I also had added group by pid    (arg0,arg1, pid) and the counts 
were all coming as 1

Then I just grouped by filename and location (arg0,arg1 of reads) and 
the counts came back as

# cat read.d
#!/usr/sbin/dtrace -s
syscall::read:entry
/execname=="postgres"/
{
   @read[fds[arg0].fi_pathname, arg1] = count();
}

# ./read.d
dtrace: script './read.d' matched 1 probe
^C

 /export/home0/igen/pgdata/pg_clog/0014               
-2753028293472                1
 /export/home0/igen/pgdata/pg_clog/0014               
-2753028277088                1
 /export/home0/igen/pgdata/pg_clog/0015               
-2753028244320                2
 /export/home0/igen/pgdata/pg_clog/0015               
-2753028268896               14
 /export/home0/igen/pgdata/pg_clog/0015               
-2753028260704               25
 /export/home0/igen/pgdata/pg_clog/0015               
-2753028252512               27
 /export/home0/igen/pgdata/pg_clog/0015               
-2753028277088               28
 /export/home0/igen/pgdata/pg_clog/0015               
-2753028293472               37

FYI  I pressed ctrl-c within like less than a second

So to me this seems that multiple processes are reading the same page 
from different pids. (This was with about 600 suers active.

Aparently we do have a problem that we are reading the same buffer 
address again.  (Same as not being cached anywhere or not finding it in 
cache anywhere).

I reran lock wait script on couple of processes and did not see 
CLogControlFileLock  as a problem..

# ./83_lwlock_wait.d 14341

            Lock Id            Mode           Count
      WALInsertLock       Exclusive               1
      ProcArrayLock       Exclusive              16

            Lock Id   Combined Time (ns)
      WALInsertLock               383109
      ProcArrayLock            198866236

# ./83_lwlock_wait.d 14607

            Lock Id            Mode           Count
      WALInsertLock       Exclusive               2
      ProcArrayLock       Exclusive              15

            Lock Id   Combined Time (ns)
      WALInsertLock                55243
      ProcArrayLock             69700140

#

What will help you find out why it is reading the same page again?

-Jignesh

Jignesh K. Shah wrote:
I agree with Tom..  somehow I think  increasing NUM_CLOG_BUFFERS is 
just avoiding the symptom to a later value.. I promise to look more 
into it before making any recommendations to increase NUM_CLOG_BUFFERs.

Because though "iGen"  showed improvements in that area by increasing 
num_clog_buffers , EAStress had shown no improvements.. Plus the 
reason I think this is not the problem in 8.3beta1 since the Lock 
Output clearly does not show CLOGControlFile as to be the issue which 
I had seen in earlier case.  So I dont think that increasing 
NUM_CLOG_BUFFERS will change thing here.

Now I dont understand the code pretty well yet I see three hotspots 
and not sure if they are related to each other
* ProcArrayLock waits  - causing Waits          as reported by 
83_lockwait.d script
* SimpleLRUReadPage - causing read IOs             as reported by 
iostat/rsnoop.d
* GetSnapshotData - causing CPU utilization  as reported by hotuser

But I will shut up and do more testing.

Regards,
Jignesh

Tom Lane wrote:
Josh Berkus <josh@xxxxxxxxxxxx> writes:

Actually, 32 made a significant difference as I recall ... do you 
still have the figures for that, Jignesh?

I'd want to see a new set of test runs backing up any call for a change
in NUM_CLOG_BUFFERS --- we've changed enough stuff around this area that
benchmarks using code from a few months back shouldn't carry a lot of
weight.

            regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

              http://archives.postgresql.org