On Sat, 2021-07-24 at 19:50 +0100, Matthew Wilcox wrote: > On Sat, Jul 24, 2021 at 11:23:25AM -0700, James Bottomley wrote: > > On Sat, 2021-07-24 at 19:14 +0100, Matthew Wilcox wrote: > > > On Sat, Jul 24, 2021 at 11:09:02AM -0700, James Bottomley wrote: > > > > On Sat, 2021-07-24 at 18:27 +0100, Matthew Wilcox wrote: > > > > > What blows me away is the 80% performance improvement for > > > > > PostgreSQL. I know they use the page cache extensively, so > > > > > it's > > > > > plausibly real. I'm a bit surprised that it has such good > > > > > locality, and the size of the win far exceeds my > > > > > expectations. We should probably dive into it and figure out > > > > > exactly what's going on. > > > > > > > > Since none of the other tested databases showed more than a 3% > > > > improvement, this looks like an anomalous result specific to > > > > something in postgres ... although the next biggest db: mariadb > > > > wasn't part of the tests so I'm not sure that's > > > > definitive. Perhaps the next step should be to t > > > > est mariadb? Since they're fairly similar in domain (both full > > > > SQL) if mariadb shows this type of improvement, you can > > > > safely assume it's something in the way SQL databases handle > > > > paging and if it doesn't, it's likely fixing a postgres > > > > inefficiency. > > > > > > I think the thing that's specific to PostgreSQL is that it's a > > > heavy user of the page cache. My understanding is that most > > > databases use direct IO and manage their own page cache, while > > > PostgreSQL trusts the kernel to get it right. > > > > That's testable with mariadb, at least for the innodb engine since > > the flush_method is settable. > > We're still not communicating well. I'm not talking about writes, > I'm talking about reads. Postgres uses the page cache for reads. > InnoDB uses O_DIRECT (afaict). See articles like this one: > https://www.percona.com/blog/2018/02/08/fsync-performance-storage-devices/ If it were all about reads, wouldn't the Phoronix pgbench read only test have shown a better improvement than 7%? I think the Phoronix data shows that whatever it is it's to do with writes ... that does imply something in the way the log syncs data. James