On Tue, Jul 23, 2024 at 04:01:41PM +0100, John Garry wrote: > On 23/07/2024 15:42, Christoph Hellwig wrote: > > On Tue, Jul 23, 2024 at 11:11:28AM +0100, John Garry wrote: > > > I am looking at something like this to implement read-only for those inodes: > > > > Yikes. Treating individual inodes in a file systems as read-only > > is about the most confusing and harmful behavior we could do. > > That was the suggestion which I was given earlier in this thread. Well, Christoph and I suggested failing the mount /earlier/ in this thread. ;) > > > > Just treat it as any other rocompat feature please an mount the entire > > file system read-only if not supported. > > > > Or even better let this wait a little, and work with Darrick to work > > on the rextsize > 1 reflіnk patches and just make the thing work. > > I'll let Darrick comment on this. COW with alloc_unit > fsblock is not currently possible, whether it's forcealign or rtreflink because COW must happen at allocation unit granularity. Pure overwrites don't need all these twists and turns. 1. For COW to work, each write/page_mkwrite must mark dirty every fsblock in the entire alloc unit. Those fsblocks could be cached by multiple folios, which means (in iomap terms) dirtying each block in potentially multiple iomap_folio_state structures, as well as their folios. 2. Similarly, writeback must then be able to issue IO in quantities that are aligned to allocation units. IOWs, for every dirty region in the file, we'd have to find the folios for a given allocation unit, mark them all for writeback, and issue bios for however much we managed to do. If it's not possible to grab a folio, then the entire allocation unit can't be written out, which implies that writeback can fail to fully clean folios. 3. Alternately I suppose we could track the number of folios undergoing writeback for each allocation unit, issue the writeback ios whenever we're ready, and only remap the allocation unit when the number of folios undergoing writeback for that allocation unit reaches zero. If we could get the mapping_set_folio_order patch merged, then we could at least get partial support for power-of-two alloc_unit > fsblock configurations by setting the minimum folio order to log2(alloc_unit). For atomic writes this is probably a hard requirement because we must be able to submit one bio with one memory region. For everyone else this sucks because cranking up the min folio order reduces the flexibility that the page cache can have in finding cache memory... but until someone figures out how to make the batching work, there's not much progress to be made. For non power-of-two alloc_unit we can't just crank up the min folio order because there will always be misalignments somewhere; we need a full writeback batching implementation that can handle multiple folios per alloc unit and partial folio writeback. djwong-dev implements 1. It partially handles 2 by enlarging the wbc range to be aligned to allocation units, but it doesn't guarantee that all the folios actually got tagged for the batch. It can't do 3, which means that it's probably broken if you press it hard enough. Alternately we could disallow non power-of-two everywhere, which would make the accounting simpler but that's a regression against ye olde xfs which supports non power-of-two allocation units. rtreflink is nowhere near ready to go -- it's still in djwong-wtf behind metadata directories, rtgroups, realtime rmap, and (probably) hch's zns patches. > > > > So what about forcealign and RT? > > > > > > Any opinion on this? > > > > What about forcealign and RT? > > In this series version I was mounting the whole FS as RO if > XFS_FEAT_FORCEALIGN and XFS_FEAT_REFLINK was found in the SB. And so very > different to how I was going to individual treat inodes which happen to be > forcealign and reflink, above. > > So I was asking guidance when whether that approach (for RT and forcealign) > is sound. I reiterate: don't allow mounting of (forcealign && reflink) or (forcealign && rtextsize > 1) filesystems, and then you and I can work on figuring out the rest. --D