Re: ext4 fix for interaction between i_size, fallocate, and delalloc after a crash

Ashlie Martinez <ashmrtn@xxxxxxxxxx> · Tue, 28 Nov 2017 07:04:54 -0600

On Mon, Nov 27, 2017 at 10:11 AM, Theodore Ts'o <tytso@xxxxxxx> wrote:
> On Mon, Nov 27, 2017 at 08:31:07AM -0600, Ashlie Martinez wrote:
>> Ted,
>>
>> Thank you very much for taking the time to lay all of this out for me
>> (and throwing some humor and youtube links to boot), despite how busy
>> you were (I hope everything is alright!). I see now why the fix works
>> and what was going wrong. It appears I was confused about the order of
>> operations being performed in the test based on what I read in another
>> email. I believe in another email somewhere I read that the fallocate
>> was before a delayed write so I was thinking something like fallocate
>> then write. I see now that it is write with delayed allocation
>> (resolved after fallocate) and then fallocate. With that piece of
>> information everything else about the test, delayed allocation, and
>> the fix make sense.
>
> Sorry, "before" was misleading.  When I used the word "before", I was
> speaking of the order that the operations hit the disk.  The confusion
> comes from the fact that the delayed allocation write was *issued*
> before the fallocate, but in terms of when they are committed to disk,
> the fallocate commits *first*, and then 25-30 seconds later, the
> delayed allocation write is resolved and then committed to disk.

No biggie, part of the reason this was so hard for me to wrap my head
around is I don't have a physical machine that I can reproduce this on
(and I never got around to getting a GCE instance to test on). Not
being able to poke around a reproducing system makes it a little bit
harder for me to reason about :)

>
> It's the difference between the order that the operations are issued
> and when they are committed to disk which is what caused the bug; and
> the problem reproduction relies on crashing/aborting the file system
> between the time that the two operations would have been committed.
>
> Hopefully this will be helpful in terms of finding a way to create
> automated file system testing systems that can detect bugs similar to
> this one.  I can imagine that if you ever want to extend this to
> database testing, a similar technique might be used to detect
> transactions which close in a different order than how they were
> issued, or dealing transactions which end up getting rolled back.
>

Vijay and I are hopeful that we can find some reliable way to
reproduce this in CrashMonkey. It has also showed us a class of timing
bugs that we can't find with the current iteration of CrashMonkey, but
we hope we can expand what we have to find them in the future.

>                                                 - Ted
>
> P.S.  I see you have some Google internships under your belt, so I'm
> sure you know the drill, but I hope you'll consider us for another
> future internship experience.   :-)

Haha it's always been nice to be a little bit spoiled while interning
there for a summer. I hope I can make way back there for another
internship etc. eventually :)