Re: [PATCH v2 0/7] large atomic writes for xfs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 13/12/2024 14:38, Christoph Hellwig wrote:
On Tue, Dec 10, 2024 at 12:57:30PM +0000, John Garry wrote:
Currently the atomic write unit min and max is fixed at the FS blocksize
for xfs and ext4.

This series expands support to allow multiple FS blocks to be written
atomically.

Can you explain the workload you're interested in a bit more?

Sure, so some background is that we are using atomic writes for innodb MySQL so that we can stop relying on the double-write buffer for crash protection. MySQL is using an internal 16K page size (so we want 16K atomic writes).

MySQL has what is known as a REDO log - see https://dev.mysql.com/doc/dev/mysql-server/9.0.1/PAGE_INNODB_REDO_LOG.html

Essentially it means that for any data page we write, ahead of time we do a buffered 512B log update followed by a periodic fsync. I think that such a thing is common to many apps.


I'm still very scared of expanding use of the large allocation sizes.

Yes


IIRC you showed some numbers where increasing the FSB size to something
larger did not look good in your benchmarks, but I'd like to understand
why.  Do you have a link to these numbers just to refresh everyones minds
why that wasn't a good idea.

I don't think that I can share numbers, but I will summarize the findings.

When we tried just using 16K FS blocksize, we found for low thread count testing that performance was poor - even worse baseline of 4K FS blocksize and double-write buffer. We put this down to high write latency for REDO log. As you can imagine, mostly writing 16K for only a 512B update is not efficient in terms of traffic generated and increased latency (versus 4K FS block size). At higher thread count, performance was better. We put that down to bigger log data portions to be written to REDO per FS block write.

For 4K FS blocksize and 16K atomic writes configs - supported via forcealign or RTvol - performance will generally good across the board. forcealign was a bit better.

We also tried a hybrid solution with 2x partitions - 1x partition with 16K FS block size for data and 1x partition with 4K FS block size for REDO. Performance here was good also. Unfortunately, though, this config is not fit for production - that is because we have a requirement to do FS snapshot and that is not possible over 2x FS instances. We also did consider block device snapshot, but there is reluctance to try this also.

Did that also include supporting atomic
writes in the sector size <= write size <= FS block size range, which
aren't currently supported, but very useful?

I have no use for that so far.

Thanks,
John





[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux