On Mon, Feb 8, 2021 at 8:20 AM Phillip Susi <phill@xxxxxxxxxxxx> wrote: > > > Chris Murphy writes: > > > I showed that the archived journals have way more fragmentation than > > active journals. And the fragments in active journals are > > insignificant, and can even be reduced by fully allocating the journal > > Then clearly this is a problem with btrfs: it absolutely should not be > making the files more fragmented when asked to defrag them. I've asked. We'll see.. > > file to final size rather than appending - which has a good chance of > > fragmenting the file on any file system, not just Btrfs. > > And yet, you just said the active journal had minimal fragmentation. Yes, the extents are consistently 8MB in the nodatacow case, old and new file system alike. Same as ext4 and XFS. > That seems to mean that the 8mb fallocates that journald does is working > well. Sure, you could proabbly get fewer fragments by fallocating the > whole 128 mb at once, but there are tradeoffs to that that are not worth > it. One fragment per 8 mb isn't a big deal. Ideally a filesystem will > manage to do better than that ( didn't btrfs have a persistent > reservation system for this purpose? ), but it certainly should not > commonly do worse. I don't think any of the file systems guarantee a contiguous block range upon fallocate, they only guarantee that writes to fallocated space will succeed. i.e. it's a space reservation. But yeah in practice, 8MB is small enough that chances are you'll see one 8MB extent. And I agree 8MB isn't a big deal. Does anyone complain about journal fragmentation on ext4 or xfs? If not, then we come full circle to my second email in the thread which is don't defragment when nodatacow, only defragment when datacow. Or use BTRFS_IOC_DEFRAG_RANGE and specify 8MB length. That does seem to consistently no op on nodatacow journals which have 8MB extents. > > Further, even *despite* this worse fragmentation of the archived > > journals, bcc-tools fileslower shows no meaningful latency as a > > result. I wrote this in the previous email. I don't understand what > > you want me to show you. > > *Of course* it showed no meaningful latency because you did the test on > an SSD, which has no meaningful latency penalty due to fragmentation. > The question is how bad is it on HDD. The reason I'm dismissive is because the nodatacow fragment case is the same as ext4 and XFS; the datacow fragment case is both spectacular and non-deterministic. The workload will matter where these random 4KiB journal writes end up on an HDD. I've seen journals with hundreds to thousands of extents. I'm not sure what we learn from me doing a single isolated test on an HDD. And also, only defragmenting on rotation strikes me as leaving performance on the table, right? If there is concern about fragmented archived journals, then isn't there concern about fragmented active journals? But it sounds to me like you want to learn what the performance is of journals defragmented with BTFS_IOC_DEFRAG specifically? I don't think it's interesting because you're still better off leaving nodatacow journals alone, and something still has to be done in the datacow case. It's two extremes. What the performance is doesn't matter, it's not going to tell you anything you can't already infer from the two layouts. > > And since journald offers no ability to disable the defragment on > > Btrfs, I can't really do a longer term A/B comparison can I? > > You proposed a patch to disable it. Test before and after the patch. Is there a test mode for journald to just dump a bunch of random stuff into the journal to age it? I don't want to wait weeks to get a dozen journal files. > > > I did provide data. That you don't like what the data shows: archived > > journals have more fragments than active journals, is not my fault. > > The existing "optimization" is making things worse, in addition to > > adding a pile of unnecessary writes upon journal rotation. > > If it is making things worse, that is definately a bug in btrfs. It > might be nice to avoid the writes on SSD though since there is no > benefit there. Agreed. -- Chris Murphy _______________________________________________ systemd-devel mailing list systemd-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/systemd-devel