On Fri, Feb 05, 2021 at 05:44:03PM -0700, Chris Murphy wrote: > On Fri, Feb 5, 2021 at 3:55 PM Lennart Poettering > <lennart@xxxxxxxxxxxxxx> wrote: > > > > On Fr, 05.02.21 20:58, Maksim Fomin (maxim@xxxxxxxxx) wrote: > > > > > > You know, we issue the btrfs ioctl, under the assumption that if the > > > > file is already perfectly defragmented it's a NOP. Are you suggesting > > > > it isn't a NOP in that case? > > > > > > So, what is the reason for defragmenting journal is BTRFS is > > > detected? This does not happen at other filesystems. I have read > > > this thread but has not found a clear answer to this question. > > > > btrfs like any file system fragments files with nocow a bit. Without > > nocow (i.e. with cow) it fragments files horribly, given our write > > pattern (wich is: append something to the end, and update a few > > pointers in the beginning). By upstream default we set nocow, some > > downstreams/users undo that however. (this is done via tmpfiles, > > i.e. journald doesn't actually set nocow ever). > > I don't see why it's upstream's problem to solve downstream decisions. > If they want to (re)enable datacow, then they can also setup some kind > of service to defragment /var/log/journal/ on a schedule, or they can > use autodefrag. > It seems cooperative to me that applications advise the filesystem on appropriate optimization opportunities. Taking a step back and looking at what journald is doing, how and when these journal files are accessed, it doesn't strike me as illogical to tell the fs when archiving it's a good time to defragment the file. > > > When we archive a journal file (i.e stop writing to it) we know it > > will never receive any further writes. It's a good time to undo the > > fragmentation (we make no distinction whether heavily fragmented, > > little fragmented or not at all fragmented on this) and thus for the > > future make access behaviour better, given that we'll still access the > > file regularly (because archiving in journald doesn't mean we stop > > reading it, it just means we stop writing it — journalctl always > > operates on the full data set). defragmentation happens in the bg once > > triggered, it's a simple ioctl you can invoke on a file. if the file > > is not fragmented it shouldn't do anything. > > ioctl(3, BTRFS_IOC_DEFRAG_RANGE, {start=0, len=16777216, flags=0, > extent_thresh=33554432, compress_type=BTRFS_COMPRESS_NONE}) = 0 > > What 'len' value does journald use? > journald uses BTRFS_IOC_DEFRAG, there is no range argument; it's the whole file. I'm inclined to agree with Lennart on this looking more like a btrfs issue than journald issue, based on your claims. journald is arguably Doing The Right Thing by advising btrfs of a defrag opportunity. If btrfs can't usefully defragment the file vs. its layout, it should NOOP the ioctl. If it's producing more fragmented files post-defrag, how is that not a btrfs bug? Some things I didn't see being considered in your comparisons is filesystem free space, age, and concurrent use. If your comparisons are on fresh filesystems, fragmentation tends to be much lower as the business of finding contiguous blocks of free space is trivial. Once the filesystem has aged enough to churn through the available space, fragmentation increases substantially. When journald is the only writer on an otherwise idle filesystem, it's less likely to have its allocations interrupted by allocations to other writers. To make meaningful measurements of fragmentation and the necessity of telling the fs "hey, now's a good time to defrag this file I'm no longer going to write to", you need to look at more worst case scenarios, not best case. On a different note, I feel like there's an unnecessarily combative tone to this discussion. Maybe it's just me, but it deterred me from participating up until this point. Regards, Vito Caputo _______________________________________________ systemd-devel mailing list systemd-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/systemd-devel