Thanks Dave for replying. I will think about it. Wengang > On May 10, 2024, at 6:56 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > On Wed, May 08, 2024 at 10:03:43AM -0700, Wengang Wang wrote: >> Hi Dave, this is more a question than a patch. >> >> We are current disallowing the change of extsize on files/dirs if the file/dir >> have blocks allocated. That's not that friendly to users. Say somehow the >> extsize was set very huge (1GiB), in the following cases, it's not that > > The first problem is ensuring that "say somehow extsize was set very > huge" doesn't happen in the first place. Then all the other problems > just don't happen. > >> convenient: >> case 1: the file now extends very little. -- 1GiB extsize leads a waste of >> almost 1GiB. >> case 2: when CoW happens, 1GiB is preallocated. 1GiB is now too big for the >> IO pattern, so the huge preallocting and then reclaiming is not necessary >> and that cost extra time especially when the system if fragmented. >> >> In above cases, changing extsize smaller is needed. >> >> In theory, the exthint is a hint for future allocation, > > It's not that simple because future allocation is influenced by past > allocation. e.g. What happens if the new extent size hint is not > aligned with the old one and we now have two different extent > alignments in the file? > > What happens if an admin sees this when trying to triage some > other problem and doesn't know that the extent size hint has been > changed? They'll think there is a bug in the filesystem allocator > and report it. > > What do we do with that report now? Do we waste hours trying to > reproduce it and fail, maybe never learning that the an extent > size hint change caused the issue? i.e. how do we determine that the > issue is a real allocation alignment bug versus it simply being a > result of "application did something whacky with extent size hints"? > > Hence allowing extent size hints to change dynamically basically > makes it impossible to trust that the current extent size hint > defines the alignment for all the extents in the file. And at that > point, we completely lose the ability to triage allocation alignment > issues without an exact reproducer from the reporter... > > Now, just disabling extent size hints avoids this issue (i.e. allow > return to zero if extents already exist) because there's now no > alignment restriction at all and nobody is going to care. However, > this creates new issues. > > e.g it opens up the possibility that applications will scan existing > files for extent size hints set on them and be able to -override the > admin set alignment hints- used to create the data set. > > The admin may have set inheritable extent size hints to ensure > allocation alignment to underlying storage because the applications > don't know about optimal storage alignments (e.g. for PMD alignment > on DAX storage). We don't want applications to be able to disable > these hints because the precise reason they are set is to optimise > storage alignment for better application performance.... > > IOWs, there are good reasons for not allowing extent size hints to > be overrridden by applications just by clearing/changing the inode > extent size field... > >> I can't connect it >> to the blocks which are already allocated to the file/dir. >> So the only reason why we disallow that is that there might be some problems if >> we allow it. Well, can we fix the real problem(s) rather than disallowing >> extsize changing? > > The only reliable way to change extent size hints so allocation > alignment always matches the new extent size hint is to physically > realign the data in the file to the new extent size hint. i.e. do it > through xfs_fsr to "defrag" the file according to the new extent > size hint. Then when we swap the old and new data extents, we also > set the new extent size hint that matches the new data extents. > > This extent size hint change is then enabled through a completely > different interface which is not one applications will use in > general operation. Hence it becomes an explicit admin operation, > enabling users to rectify the rare problems you document above > without compromising the existing behaviour of extent size hints for > everyone else. > > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx