On Fri, Dec 13, 2024 at 05:15:55PM +0000, John Garry wrote: > Sure, so some background is that we are using atomic writes for innodb > MySQL so that we can stop relying on the double-write buffer for crash > protection. MySQL is using an internal 16K page size (so we want 16K atomic > writes). Make perfect sense so far. > > MySQL has what is known as a REDO log - see > https://dev.mysql.com/doc/dev/mysql-server/9.0.1/PAGE_INNODB_REDO_LOG.html > > Essentially it means that for any data page we write, ahead of time we do a > buffered 512B log update followed by a periodic fsync. I think that such a > thing is common to many apps. So it's actually using buffered I/O for that and not direct I/O? > When we tried just using 16K FS blocksize, we found for low thread count > testing that performance was poor - even worse baseline of 4K FS blocksize > and double-write buffer. We put this down to high write latency for REDO > log. As you can imagine, mostly writing 16K for only a 512B update is not > efficient in terms of traffic generated and increased latency (versus 4K FS > block size). At higher thread count, performance was better. We put that > down to bigger log data portions to be written to REDO per FS block write. So if the redo log uses buffered I/O I can see how that would bloat writes. But then again using buffered I/O for a REDO log seems pretty silly to start with.