Hi All, I found it's not related to file I/O. I use systemtap to diagnose the postgres backend process. The systemtap script is a modified version of sample-bt-off-cpu: https://gist.github.com/kingluo/15b656998035cef193bc Test process: 1) create a simple table: ----------- create table test(k bigserial primary key, a int, b int, c text, d text); ----------- 2) test.sql: ----------- insert into test(a, b, c, d) values(3438, 1231, 'ooooooooooooooooooooooooooooooooooooo', 'kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk'); ----------- 3) I run the pgbench, using 4 connections: ----------- $ time pgbench -r -N -n -c 4 -j 1 -T 3600 -f test.sql ----------- 4) I run the systemtap script for 3 minutes (Here the 987 is the pid of one postgres backend process): Note that sum, max, avg, min time is in microsecond. ----------- # ./sample-bt-off-cpu -a -v -p 987 -t 180 > /tmp/postgres.bt Pass 1: parsed user script and 110 library script(s) using 26916virt/22300res/4276shr/18740data kb, in 380usr/10sys/386real ms. Pass 2: analyzed script: 18 probe(s), 13 function(s), 5 embed(s), 12 global(s) using 54708virt/51956res/5920shr/46532data kb, in 1870usr/360sys/2669real ms. Pass 3: translated to C into "/tmp/stapteVG3Q/stap_18d161acb3024931de917335496977c3_12813_src.c" using 54708virt/52156res/6120shr/46532data kb, in 8680usr/250sys/24647real ms. Pass 4: compiled C into "stap_18d161acb3024931de917335496977c3_12813.ko" in 20450usr/610sys/43898real ms. Pass 5: starting run. WARNING: Tracing 987 (/opt/postgresql/bin/postgres)... WARNING: Too many CFI instuctions WARNING: Time's up. Quitting now...(it may take a while) WARNING: query time, count=646253, sum=102991078, max=2474344, avg=159, min=83 WARNING: lock time, count=142, sum=3306500, max=1158862, avg=23285, min=16 WARNING: lwlock time, count=141272, sum=7260098, max=1383180, avg=51, min=1 WARNING: Number of errors: 0, skipped probes: 24 Pass 5: run completed in 10usr/110sys/180744real ms. ----------- During that 3 minutes, the postgres prints below logs: ----------- 2016-01-13 23:27:21 CST [987-157] postgres@postgres LOG: duration: 2474.304 ms statement: insert into test(a, b, c, d) values(3438, 1231, 'ooooooooooooooooooooooooooooooooooooo', 'kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk'); 2016-01-13 23:27:48 CST [987-158] postgres@postgres LOG: duration: 1383.359 ms statement: insert into test(a, b, c, d) values(3438, 1231, 'ooooooooooooooooooooooooooooooooooooo', 'kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk'); 2016-01-13 23:28:33 CST [987-159] postgres@postgres LOG: process 987 still waiting for ExclusiveLock on extension of relation 16386 of database 12141 after 1000.212 ms 2016-01-13 23:28:33 CST [987-160] postgres@postgres DETAIL: Process holding the lock: 990. Wait queue: 988, 987, 989. 2016-01-13 23:28:33 CST [987-161] postgres@postgres STATEMENT: insert into test(a, b, c, d) values(3438, 1231, 'ooooooooooooooooooooooooooooooooooooo', 'kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk'); 2016-01-13 23:28:33 CST [987-162] postgres@postgres LOG: process 987 acquired ExclusiveLock on extension of relation 16386 of database 12141 after 1158.818 ms 2016-01-13 23:28:33 CST [987-163] postgres@postgres STATEMENT: insert into test(a, b, c, d) values(3438, 1231, 'ooooooooooooooooooooooooooooooooooooo', 'kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk'); 2016-01-13 23:28:33 CST [987-164] postgres@postgres LOG: duration: 1159.512 ms statement: insert into test(a, b, c, d) values(3438, 1231, 'ooooooooooooooooooooooooooooooooooooo', 'kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk'); 2016-01-13 23:28:45 CST [987-165] postgres@postgres LOG: duration: 1111.664 ms statement: insert into test(a, b, c, d) values(3438, 1231, 'ooooooooooooooooooooooooooooooooooooo', 'kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk'); ----------- The final backtrace result is converted into flame graph, see: http://luajit.io/systemtap/pgsql/postgres_1.svg Any advice? Regards, Jinhua Luo 2016-01-11 5:05 GMT+08:00 Jim Nasby <Jim.Nasby@xxxxxxxxxxxxxx>: > On 1/9/16 11:57 PM, Jinhua Luo wrote: >> >> But I do not understand that why the process do so many IO with async >> commit? And it does not even happen at the shared buffer flushing and >> locks waiting. Where's the code path doing these IO? > > > I assume you're asking about all the IO to the heap table. That is most > likely occurring as part of ReadBuffer(). As soon as you fill up shared > buffers, BufferAlloc() is likely to end up with a dirty buffer, resulting in > it calling FlushBuffer() (see src/backend/storage/buffer/bufmgr.c#1084). > > Note that that call is tracked by > TRACE_POSTGRESQL_BUFFER_WRITE_DIRTY_START(), so I'd expect you to see it in > the relevant systemtap stats. > -- > Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX > Experts in Analytics, Data Architecture and PostgreSQL > Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance