Re: [PATCH v4.4.y] drivers: core: Remove glue dirs from sysfs earlier

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jan 29, 2019 at 06:35:48AM -0800, Guenter Roeck wrote:
> On Mon, Jan 28, 2019 at 10:58 PM Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> >
> > On Mon, Jan 28, 2019 at 09:31:30AM -0800, Zubin Mithra wrote:
> > > From: Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx>
> > >
> > > commit 726e41097920a73e4c7c33385dcc0debb1281e18 upstream
> > >
> > > For devices with a class, we create a "glue" directory between
> > > the parent device and the new device with the class name.
> > >
> > > This directory is never "explicitely" removed when empty however,
> > > this is left to the implicit sysfs removal done by kobject_release()
> > > when the object loses its last reference via kobject_put().
> > >
> > > This is problematic because as long as it's not been removed from
> > > sysfs, it is still present in the class kset and in sysfs directory
> > > structure.
> > >
> > > The presence in the class kset exposes a use after free bug fixed
> > > by the previous patch, but the presence in sysfs means that until
> > > the kobject is released, which can take a while (especially with
> > > kobject debugging), any attempt at re-creating such as binding a
> > > new device for that class/parent pair, will result in a sysfs
> > > duplicate file name error.
> > >
> > > This fixes it by instead doing an explicit kobject_del() when
> > > the glue dir is empty, by keeping track of the number of
> > > child devices of the gluedir.
> > >
> > > This is made easy by the fact that all glue dir operations are
> > > done with a global mutex, and there's already a function
> > > (cleanup_glue_dir) called in all the right places taking that
> > > mutex that can be enhanced for this. It appears that this was
> > > in fact the intent of the function, but the implementation was
> > > wrong.
> > >
> > > Backport Note: kref_read() is not present in 4.4. Hence,
> > > use atomic_read(&kref.refcount) instead of kref_read(&kref).
> > >
> > > Signed-off-by: Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx>
> > > Acked-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> > > Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
> > > Signed-off-by: Zubin Mithra <zsm@xxxxxxxxxxxx>
> > > ---
> > >  drivers/base/core.c     |  2 ++
> > >  include/linux/kobject.h | 17 +++++++++++++++++
> > >  2 files changed, 19 insertions(+)
> >
> > Wait, why is this needed?
> >
> 
> We have syzcaller/syzbot hits, and there is a reproducer. Is this sufficient ?

Nice, where?

> > And why only for 4.4?  What about 4.9 and 4.14?  Do you want to upgrade
> > and suddenly hit the same "bug" that you fixed before?
> >
> 
> Good point. Sorry, that got lost. We are, after all, oly human. I'd
> ask Zubin to provide backports, or do it myself, but we'll have to
> resolve the issue you bring up below first.
> 
> > There was a reason that I did not backport this to the stable tree when
> > it was submitted, and that was because this was an odd race to ever hit.
> > Are you hitting this in the real world without kobject deferred
> > release enabled?  And if so, are you hitting the WARN_ON that is added
> > here?
> >
> I think we may need updated rules for stable. Many bug fixes are
> backported to stable releases without having been seen in the "real
> world" (whatever that means). At the same time we do see many races in
> the real world, many of them not fully understood. Our policy so far
> is to fix as many problems as possible if they are understood, in the
> hope that it fixes at least some of those problems seen in the field.

I don't have a problem with backporting this, but it went in as a "fixes
a theoritical issue, and let's WARN_ON if it really ever is hit".

So I would like to see where we are hitting that WARN_ON, as it sounds
like you found an easy-to-reproduce way to do it.

> If there is a new rule that problems have to have been observed in the
> real world (ie without debugging options enabled, and without specific
> reproducer) before a patch is applied to stable, can we have that as
> generic rule that is not only selectively applied ?

It's not a specific rule, it's just that this specific patch went
through a 30+ email chain of us arguing about it, so I really want to
see why this is needed, and why Ben was right and I was wrong :)

I'm not trying to be extra hard here, it's just that I have a lot of
history with this patch...

So if you all have a reproducer, I would love to see it.

Oh, and the backports to the other kernel versions as well, you don't
want to have to fix this again in a year when you all upgrade to a new
release.

thanks,

greg k-h



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux