Re: [PATCH v4 08/18] tools/testing/nvdimm: add 'bio_delay' mechanism

Jeff Moyer <jmoyer@xxxxxxxxxx> · Wed, 03 Jan 2018 15:37:50 -0500

Jan Kara <jack@xxxxxxx> writes:

> On Tue 02-01-18 13:51:49, Dan Williams wrote:
>> On Tue, Jan 2, 2018 at 1:44 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
>> > On Sat, Dec 23, 2017 at 04:56:43PM -0800, Dan Williams wrote:
>> >> In support of testing truncate colliding with dma add a mechanism that
>> >> delays the completion of block I/O requests by a programmable number of
>> >> seconds. This allows a truncate operation to be issued while page
>> >> references are held for direct-I/O.
>> >>
>> >> Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
>> >
>> > Why not put this in the generic bio layer code and then write a
>> > generic fstest to exercise this truncate vs direct IO completion
>> > race condition on all types of storage and filesystems?
>> >
>> > i.e. if it sits in a nvdimm test suite, it's never going to be run
>> > by filesystem developers....
>> 
>> I do want to get it into xfstests eventually. I picked the nvdimm
>> infrastructure for expediency of getting the fix developed. Also, I
>> consider the collision in the non-dax case a solved problem since the
>> core mm will keep the page out of circulation indefinitely.
>
> Yes, but there are different races that could happen even for regular page
> cache pages. So I also think it would be worthwhile to have this inside the
> block layer possibly as part of the generic fault-injection framework which
> is already there for fail_make_request. That already supports various
> filtering, frequency, and other options that could be useful.

Or consider extending the dm-delay target (which delays the queuing of
bios) to support delaying the completions.  I'm not sure I'm a fan of
sticking all sorts of debug code into the generic I/O submission path.

Cheers,
Jeff