On Thu, Jul 18, 2024 at 07:03:40PM +0000, Wengang Wang wrote: > > > > On Jul 16, 2024, at 9:11 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > > > On Tue, Jul 16, 2024 at 08:23:35PM +0000, Wengang Wang wrote: > >>> Ok, so this is a linear iteration of all extents in the file that > >>> filters extents for the specific "segment" that is going to be > >>> processed. I still have no idea why fixed length segments are > >>> important, but "linear extent scan for filtering" seems somewhat > >>> expensive. > >> > >> Hm… fixed length segments — actually not fixed length segments, but segment > >> size can’t exceed the limitation. So segment.ds_length <= LIMIT. > > > > Which is effectively fixed length segments.... > > > >> Larger segment take longer time (with filed locked) to defrag. The > >> segment size limit is a way to balance the defrag and the parallel > >> IO latency. > > > > Yes, I know why you've done it. These were the same arguments made a > > while back for a new way of cloning files on XFS. We solved those > > problems just with a small change to the locking, and didn't need > > new ioctls or lots of new code just to solve the "clone blocks > > concurrent IO" problem. > > I didn’t check the code history, but I am thinking you solved the problem > by allow reads to go while cloning is in progress? Correct me if I'm wrong. > The problem we hit is (heart beat) write timeout. The reason this worked (allowing shared reads through and not writes) was that the VM infrastructure this was being done for uses a sidecar write channel to redirect writes while a clone is being done. i.e. writes are not blocked by the clone in progress because they are being done to a different file. When the clone completes, those writes are folded back into the original image file. e.g. see the `qemu-img commit -b <backing file> <file with delta writes>` which will fold writes to a sidecar write file back into the original backing file that was just cloned.... What I'm suggesting is that when you run an backing file defragmentation, you use the same sidecar write setup as cloning whilst the defrag is done. Reads go straight through to the backing file, and writes get written to a delta write file. When the defrag is done the delta write file gets folded back into the backing file. But for this to work, UNSHARE needs to use shared read locking so that read IO can be directed through the file at the same time as the UNSHARE is running. If this works for CLONE to avoid read and write blocking whilst the operation is in progress, the same mechanism should be able to be used for UNSHARE, too. At this point defrag using CLONE+UNSHARE shouldn't ever block read IO and shouldn't block write IO for any significant period of time, either... -Dave. -- Dave Chinner david@xxxxxxxxxxxxx