On Fri, 2016-12-02 at 15:10 +0100, Hannes Reinecke wrote: > On 12/02/2016 02:29 PM, Ewan D. Milne wrote: > > On Fri, 2016-12-02 at 04:21 -0800, Christoph Hellwig wrote: > >> On Thu, Dec 01, 2016 at 08:40:31AM -0500, Martin K. Petersen wrote: > >>> Specifically, the problem appears to be caused by the removal of > >>> the setting of bio->bi_bdev, which would previously be set to NULL. > >>> If I add: > >> > >> Very odd. For one I would expect it to be NULL anyway, second > >> I don't see why the behavior changed. But given that this reverts > >> to the original assignment and makes things work I'll happily hack it > >> to get things working again: > >> > >> Acked-by: Christoph Hellwig <hch@xxxxxx> > > > > Yeah, I'm not sure I understand this either, apart from the change > > adjusting the code to effectively do what it used to and making the > > test case work. I'm reluctant to cc: stable yet, let me look at this > > a bit more and I'll post the actual patch soon. > > > Plus we found that this is basically a timing issue; we've found that > supposedly fixed bugs will crop up after ~4k iterations. > (Johannes did a _lot_ of testing here :-) > So just because the bug failed to materialize can also mean that you > simply didn't test long enough. > Yes, and following the code paths it isn't completely clear how this leads to the single zero-byte corruption, I am continuing to investigate. There may very well be more than one problem. On kernel versions I tested where I got a failure it was a solid failure, it never worked no matter how many times I tried, but I did not exhaustively test apparently successful kernel versions. Not thousands, of times, anyway. -Ewan -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html