On 11.03.2020 22:26, Andreas Dilger wrote: > On Mar 3, 2020, at 2:57 AM, Kirill Tkhai <ktkhai@xxxxxxxxxxxxx> wrote: >> >> On 02.03.2020 19:56, Theodore Y. Ts'o wrote: >>> Kirill, >>> >>> In a couple of your comments on this patch series, you mentioned >>> "defragmentation". Is that because you're trying to use this as part >>> of e4defrag, or at least, using EXT4_IOC_MOVE_EXT? >>> >>> If that's the case, you should note that input parameter for that >>> ioctl is: >>> >>> struct move_extent { >>> __u32 reserved; /* should be zero */ >>> __u32 donor_fd; /* donor file descriptor */ >>> __u64 orig_start; /* logical start offset in block for orig */ >>> __u64 donor_start; /* logical start offset in block for donor */ >>> __u64 len; /* block length to be moved */ >>> __u64 moved_len; /* moved block length */ >>> }; >>> >>> Note that the donor_start is separate from the start of the file that >>> is being defragged. So you could have the userspace application >>> fallocate a large chunk of space for that donor file, and then use >>> that donor file to defrag multiple files if you want to close pack >>> them. >> >> The practice shows it's not so. Your suggestion was the first thing we tried, >> but it works bad and just doubles/triples IO. >> >> Let we have two files of 512Kb, and they are placed in separate 1Mb clusters: >> >> [[512Kb file][512Kb free]][[512Kb file][512Kb free]] >> >> We want to pack both of files in the same 1Mb cluster. Packed together on block >> device, they will be in the same server of underlining distributed storage file >> system. This gives a big performance improvement, and this is the price I aimed. >> >> In case of I fallocate a large hunk for both of them, I have to move them >> both to this new hunk. So, instead of moving 512Kb of data, we will have to move >> 1Mb of data, i.e. double size, which is counterproductive. >> >> Imaging another situation, when we have >> [[1020Kb file]][4Kb free]][[4Kb file][1020Kb free]] >> >> Here we may just move [4Kb file] into [4Kb free]. But your suggestion again >> forces us to move 1Mb instead of 4Kb, which makes IO 256 times worse! This is >> terrible! And this is the thing I try prevent with finding a new interface. > > One idea I had, which may work for your use case, is to run fallocate() on > the *1MB-4KB file* to allocate the last 4KB in that hunk, then use that block > as the donor file for the 1MB+4KB file. The ext4 allocation algorithms should > always give you that 4KB chunk if it is free, and that avoids the need to try > and force the allocator to select that block through some other method. Do you mean the following: 1)fallocate() 4K at the end of *1MB-4KB* the first file (==> this increases the file length). 2)EXT4_IOC_MOVE_EXT *4KB* the second file in that new hunk. 3)truncate 4KB at the end of the first file. ? If so, this can't be an online defrag, since some process may want to increase *1MB-4KB* file in between. This will just bring to data corruption. Another problem is that power lose between 1 and 3 will result in that file length remain *1MB* instead of *1MB-4KB*. So, we still need some kernel support to implement this.