Re: [RFC PATCH v4 0/4] ext4/jbd2: data=journal: write-protect pages on transaction commit

Mauricio Faria de Oliveira <mfo@xxxxxxxxxxxxx> · Wed, 30 Sep 2020 19:59:44 -0300

On Tue, Sep 29, 2020 at 8:37 AM Jan Kara <jack@xxxxxxx> wrote:
>
> On Mon 28-09-20 16:40:59, Mauricio Faria de Oliveira wrote:
> > Hey Jan,
> >
> > This series implements your suggestions for the RFC PATCH v3 set [1].
> >
> > That addressed the issue you confirmed with block_page_mkwrite() [2].
> > There's no "JBD2: Spotted dirty metadata buffer" message in over 72h
> > runs across 3 VMs (it used to happen within a few hours.) *Thanks!*
> >
> > I added Reviewed-by: tags for the patches/changes you mentioned.
> > The only changes from v3 are patch 3 which is new, and contains
> > the 2 fixes to ext4_page_mkwrite(); and patch 4 changed 'struct
> > writeback_control.nr_to_write' from ~0ULL to LONG_MAX, since it
> > is signed long (~0ULL overflows to -1; kudos, kernel test robot!)
> >
> > It looks almost good on fstests: zero regressions on data=ordered,
> > and two apparently flaky tests data=journal (generic/347 _passed_
> > 1/6 times with the patch, and generic/547 failed 1/6 times.)
>
> Cool. Neither of these tests has anything to do with mmap. The first test
> checks what happens when thin provisioned storage runs out of space (and
> that fs remains consistent), the second tests that fsync flushed properly
> all data and that it can be seen after a crash. So I'm reasonably confident
> that it isn't caused by your patches. It still might be a bug in
> data=journal implementation though but that would be something for another
> patch series :).
>

Hey Jan,

That's good to hear! Now that the patchset seems to be in good shape,
I worked on testing it these last 2 days. Good and mixed-feelings news. :-)

1) For ext4 first, I have put 2 VMs to run fstests in a loop overnight,
(original and patched kernels, ~20 runs each). It looks like the patched VM
has more variance of failed/flaky tests, but the "average failure set" holds.

I think some of the failures were flaky or timing related, because when I ran
some tests, e.g. generic/371 a hundred times (./check -i 100 generic/371)
then it only failed 6 out of 100 times. So I didn't look much more into it, but
should you feel like recommending a more careful look, I'll be happy to do it.

For documentation purposes, the results on v5.9-rc7 and next-20200930,
showing no "permanent" regressions. Good news :)

    data=ordered:
    Failures: ext4/045 generic/044 generic/045 generic/046 generic/051
generic/223 generic/388 generic/465 generic/475 generic/553
generic/554 generic/555 generic/565 generic/611

    data=journal:
    Failures: ext4/045 generic/051 generic/223 generic/347 generic/388
generic/441 generic/475 generic/553 generic/554 generic/555
generic/565 generic/611

2) For OCFS2, I just had to change where we set the callbacks in (patch 2.)
(I'll include that in the next, hopefully non-RFC patchset, with
Andreas suggestions.)

Then a local mount also has no regressions on "stress-ng --class filesystem,io".
Good news too :)  For reference, the steps:

    # mkfs.ocfs2 --mount local $DEV
    # mount $DEV $MNT
    # cd $MNT
    # stress-ng --sequential 0 --class filesystem,io

3) Now, the mixed-feelings news.

The synthetic test-case/patches I had written clearly show that the
patchset works:
- In the original kernel, userspace can write to buffers during
commit; and it moves on.
- In the patched kernel, userspace cannot write to buffers during
commit; it blocks.

However, the heavy-hammer testing with 'stress-ng --mmap 4xNCPUs --mmap-file'
then crashing the kernel via sysrq-trigger, and trying to mount the
filesystem again,
sometimes still can find invalid checksums, thus journal recovery/mount fails.

    [   98.194809] JBD2: Invalid checksum recovering data block 109704 in log
    [   98.201853] JBD2: Invalid checksum recovering data block 69959 in log
    [   98.339859] JBD2: recovery failed
    [   98.340581] EXT4-fs (vdc): error loading journal

So, despite the test exercising mmap() and the patchset being for mmap(),
apparently there is more happening that also needs changes. (Weird; but
I will try to debug that test-case behavior deeper, to find what's going on.)

This patchset does address a problem, so should we move on with this one,
and as you mentioned, "that would be something for another patch series :)" ?

Thank you,
Mauricio

> I'll have a look at the remaining patches.
>
>                                                                 Honza
> --
> Jan Kara <jack@xxxxxxxx>
> SUSE Labs, CR

-- 
Mauricio Faria de Oliveira