My current preliminary conclusions on this box / workload:
- running psync is much better than sync
So you likely have a convincing case for Postgres guys to switch over to
pread/pwrite.
I did raise it on the PG hackers mailing list, but I couldn't convince
them =(
Pity, since there even was a patch in the past (the change seems to be
easy, but was rejected).
They say, I would need to come up with a real world PostgreSQL database
workload that shows this effect is above the noise level.
And since PostgreSQL is such a CPU hog anyway, and since I don't have
time for a full research project, I leave it.
---
But, I did more FIO level benchmarking to compare the efficiency of
different IO methods:
Here are more numbers that quantify the differences of the IO method used.
ioengine sync psync vsync pvsync pvsync2 pvsync2+hipri
iodepth 1 1 1 1 1 1
numjobs 1024 1024 1024 1024 1024 1024
concurrency 1024 1024 1024 1024 1024 1024
iops (k) 9171 9390 9196 9473 9527 9516
user 7,7 9,3 8,6 9,0 9,3 2,6
system 86,8 77,0 85,8 76,3 77,3 97,4
total 94,5 86,3 94,4 85,3 86,6 100,0
iops/system 105,7 121,9 107,2 124,2 123,2 97,7
As can be seen, the kIOPS normalized to system CPU load (last line) for
psync (pread/pwrite) is significantly higher than for sync
(lseek/read/write).
Now here is AIO:
ioengine libaio libaio libaio
iodepth 32 32 32
numjobs 128 64 32
concurrency 4096 2048 1024
iops (k) 9485,6 9479,4 8718,1
user 6,7 3,4 2,4
system 59,2 30,0 16,7
total 65,9 33,4 19,1
iops/system 160,2 316,0 522,0
The highest kIOPS/system is reached at a concurrency of 1024.
However, during my tests, I get this in kernel log:
[459346.155564] NMI watchdog: BUG: soft lockup - CPU#46 stuck for 22s!
[swapper/46:0]
[461040.530959] NMI watchdog: BUG: soft lockup - CPU#26 stuck for 22s!
[swapper/26:0]
[461044.279081] NMI watchdog: BUG: soft lockup - CPU#23 stuck for 22s!
[swapper/23:0]
I wild guess: these lockups are actually deadlocks. AIO seems to be
tricky for the kernel too.
Cheers,
/Tobias
--
To unsubscribe from this list: send the line "unsubscribe fio" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html