On 1/21/16 11:54 PM, Jinhua Luo wrote:
There is another problem.
When the autovacuum recycles the old pages, the ReadBuffer_common()
would do mdread() instead of mdextend().
The read is synchronous, while the write could be mostly asynchronous,
so the frequent read is much worse than write version.
Please don't top post.
AFAICT your analysis is correct, though I don't see what autovacuum has
to do with anything. When we need a new block to put data on we'll
either find one with free space in the free space map and use it, or
we'll extend the relation.
Any help? Please.
There's been some discussion on ways to improve the performance of
relation extension (iirc one of those was to not zero the new page), but
ultimately you're at the mercy of the underlying OS and hardware.
If you have ideas for improving this you should speak up on -hackers,
but before doing so you should read the archives about what's been
proposed in the past.
2016-01-19 12:50 GMT+08:00 Jinhua Luo <luajit.io@xxxxxxxxx>:
Hi,
I thought with async commit enabled, the backend process would rarely
do file io. But in fact, it still involves a lot of file io.
After inspecting the vfs probes using systemtap, and reading the
source codes of postgresql, I found the tight loop of insert or update
will cause heavy file io upon the data files (table, indexes) directly
by the backend process! And those io has nothing to do with shared
buffer dirty writes.
The heap_insert() or heap_update() would invoke
RelationGetBufferForTuple(), which in turn finally invoke
ReadBuffer_common():
1) lookup or allocate the buffer from shared buffer, which may cause
dirty write (but in my case, it's rare. Maybe the shared buffer is big
enough and the checkpointer or bgwriter always clean it in time). If
the target buffer is found, skip following steps.
2) if it needs to extend the relation (insert or update on new table
would normally fall in this case), then it would write zero-filled
page into the disk (used to occupy the file space? But most file
systems support file hole or space reservation, so maybe this part
could be optimized?) This procedure would hold the exclusive lock on
the relation. So if the write is slow, it would slow down all pending
queries of the lock waiters.
3) Otherwise, it would read from disk.
The target buffer would be locked exclusively until the insert or
update finish. Note that the insert or update also involve xlog
insert, although with async commit enabled, the backend process would
not flush the xlog, but chances are that the xlog buffer dirty writes
happens (although it's also rare in my case).
So I guess the real reason is the file io with lock holding. If io
spike happens, it would cause long query duration.
Am I correct? Look forward to any advice.
Thanks.
Regards,
Jinhua Luo
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance