Re: [PATCH] fs: ratelimit __find_get_block_slow() failure message.

Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> · Wed, 16 Jan 2019 17:28:13 +0100

On Wed, Jan 16, 2019 at 12:48:41PM +0100, Dmitry Vyukov wrote:
> On Wed, Jan 16, 2019 at 12:03 PM Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
> >
> > On Wed, Jan 16, 2019 at 11:43 AM Jan Kara <jack@xxxxxxx> wrote:
> > >
> > > On Wed 16-01-19 10:47:56, Dmitry Vyukov wrote:
> > > > On Fri, Jan 11, 2019 at 1:46 PM Tetsuo Handa
> > > > <penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote:
> > > > >
> > > > > On 2019/01/11 19:48, Dmitry Vyukov wrote:
> > > > > >> How did you arrive to the conclusion that it is harmless?
> > > > > >> There is only one relevant standard covering this, which is the C
> > > > > >> language standard, and it is very clear on this -- this has Undefined
> > > > > >> Behavior, that is the same as, for example, reading/writing random
> > > > > >> pointers.
> > > > > >>
> > > > > >> Check out this on how any race that you might think is benign can be
> > > > > >> badly miscompiled and lead to arbitrary program behavior:
> > > > > >> https://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong
> > > > > >
> > > > > > Also there is no other practical definition of data race for automatic
> > > > > > data race detectors than: two conflicting non-atomic concurrent
> > > > > > accesses. Which this code is. Which means that if we continue writing
> > > > > > such code we are not getting data race detection and don't detect
> > > > > > thousands of races in kernel code that one may consider more harmful
> > > > > > than this one the easy way. And instead will spent large amounts of
> > > > > > time to fix some of then the hard way, and leave the rest as just too
> > > > > > hard to debug so let the kernel continue crashing from time to time (I
> > > > > > believe a portion of currently open syzbot bugs that developers just
> > > > > > left as "I don't see how this can happen" are due to such races).
> > > > > >
> > > > >
> > > > > I still cannot catch. Read/write of sizeof(long) bytes at naturally
> > > > > aligned address is atomic, isn't it?
> > > >
> > > > Nobody guarantees this. According to C non-atomic conflicting
> > > > reads/writes of sizeof(long) cause undefined behavior of the whole
> > > > program.
> > >
> > > Yes, but to be fair the kernel has always relied on long accesses to be
> > > atomic pretty heavily so that it is now de-facto standard for the kernel
> > > AFAICT. I understand this makes life for static checkers hard but such is
> > > reality.
> >
> > Yes, but nobody never defined what "a long access" means. And if you
> > see a function that accepts a long argument and stores it into a long
> > field, no, it does not qualify. I bet this will come at surprise to
> > lots of developers.
> > Check out this fix and try to extrapolate how this "function stores
> > long into a long leads to a serious security bug" can actually be
> > applied to whole lot of places after inlining (or when somebody just
> > slightly shuffles code in a way that looks totally safe) that also
> > kinda look safe and atomic:
> > https://lore.kernel.org/patchwork/patch/599779/
> > So where is the boundary between "a long access" that is atomic and
> > the one that is not necessary atomic?
> 
> 
> +Linus, Greg, Kees
> 
> I wanted to provide a hash/link to this commit but, wait, you want to
> say that this patch for a security bugs was mailed, recorded by
> patchwork, acked by subsystem developer and then dropped on the floor
> for 3+ years? Doh!
> 
> https://lore.kernel.org/patchwork/patch/599779/
> 
> There are known ways how to make this not a thing at all. Like open
> pull requests on github:
> https://github.com/google/syzkaller/pulls
> or, some projects even do own dashboard for this:
> https://dev.golang.org/reviews
> because this is important. Especially for new contributors, drive-by
> improvements, good samaritan fixes, etc.
> 
> Another example: a bug-fixing patch was lost for 2 years:
> "Two years ago ;) I don't understand why there were ignored"
> https://www.spinics.net/lists/linux-mm/msg161351.html
> 
> Another example: a patch is applied to a subsystem tree and then lost
> for 6 months:
> https://patchwork.kernel.org/patch/10339089/

I don't understand the issue here.  Are you saying that sometimes
patches that have been submitted get dropped?  Yes, that's known, it is
up to the submitter to verify and ensure that the patch is applied.
Given our rate of change and the large workload that some maintainers
have, this is the best that we can do at the moment.

Putting it all in a github dashboard would not scale in the least (other
projects smaller than us have tried and ended up stopping from doing
that as it fails horribly).

Yes, we can always do better, but remember that the submitter needs to
take the time to ensure that their patches are applied.  Heck, I have
patches submitted months ago that I know the maintainers ignored, and I
need to remember to send them again.  We put the burden of development
on the thing that scales, the developer themselves, not the maintainer
here.

It's the best that we know of how to do at the moment, and we are always
trying to do better.  Examples of this are where some subsystems are now
getting multiple maintainers to handle the workload, and that's helping
a lot.  That doesn't work for all subsystems as not all subsystems can
even find more than one maintainer who is willing to look at the
patches.

Please, resubmit your mount patch again, that's a crazy bug :)

thanks,

greg k-h