On Wed, May 08, 2024 at 10:03:43AM -0700, Wengang Wang wrote: > Hi Dave, this is more a question than a patch. > > We are current disallowing the change of extsize on files/dirs if the file/dir > have blocks allocated. That's not that friendly to users. Say somehow the > extsize was set very huge (1GiB), in the following cases, it's not that The first problem is ensuring that "say somehow extsize was set very huge" doesn't happen in the first place. Then all the other problems just don't happen. > convenient: > case 1: the file now extends very little. -- 1GiB extsize leads a waste of > almost 1GiB. > case 2: when CoW happens, 1GiB is preallocated. 1GiB is now too big for the > IO pattern, so the huge preallocting and then reclaiming is not necessary > and that cost extra time especially when the system if fragmented. > > In above cases, changing extsize smaller is needed. > > In theory, the exthint is a hint for future allocation, It's not that simple because future allocation is influenced by past allocation. e.g. What happens if the new extent size hint is not aligned with the old one and we now have two different extent alignments in the file? What happens if an admin sees this when trying to triage some other problem and doesn't know that the extent size hint has been changed? They'll think there is a bug in the filesystem allocator and report it. What do we do with that report now? Do we waste hours trying to reproduce it and fail, maybe never learning that the an extent size hint change caused the issue? i.e. how do we determine that the issue is a real allocation alignment bug versus it simply being a result of "application did something whacky with extent size hints"? Hence allowing extent size hints to change dynamically basically makes it impossible to trust that the current extent size hint defines the alignment for all the extents in the file. And at that point, we completely lose the ability to triage allocation alignment issues without an exact reproducer from the reporter... Now, just disabling extent size hints avoids this issue (i.e. allow return to zero if extents already exist) because there's now no alignment restriction at all and nobody is going to care. However, this creates new issues. e.g it opens up the possibility that applications will scan existing files for extent size hints set on them and be able to -override the admin set alignment hints- used to create the data set. The admin may have set inheritable extent size hints to ensure allocation alignment to underlying storage because the applications don't know about optimal storage alignments (e.g. for PMD alignment on DAX storage). We don't want applications to be able to disable these hints because the precise reason they are set is to optimise storage alignment for better application performance.... IOWs, there are good reasons for not allowing extent size hints to be overrridden by applications just by clearing/changing the inode extent size field... > I can't connect it > to the blocks which are already allocated to the file/dir. > So the only reason why we disallow that is that there might be some problems if > we allow it. Well, can we fix the real problem(s) rather than disallowing > extsize changing? The only reliable way to change extent size hints so allocation alignment always matches the new extent size hint is to physically realign the data in the file to the new extent size hint. i.e. do it through xfs_fsr to "defrag" the file according to the new extent size hint. Then when we swap the old and new data extents, we also set the new extent size hint that matches the new data extents. This extent size hint change is then enabled through a completely different interface which is not one applications will use in general operation. Hence it becomes an explicit admin operation, enabling users to rectify the rare problems you document above without compromising the existing behaviour of extent size hints for everyone else. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx