Re: Possible race in dev_coredumpm()-del_timer() path

Mukesh Ojha <quic_mojha@xxxxxxxxxxx> · Wed, 13 Apr 2022 19:48:12 +0530

On 4/13/2022 4:28 PM, Greg KH wrote:
On Wed, Apr 13, 2022 at 03:46:39PM +0530, Mukesh Ojha wrote:
On Wed, Apr 13, 2022 at 07:34:24AM +0200, Greg KH wrote:
On Wed, Apr 13, 2022 at 10:59:22AM +0530, Mukesh Ojha wrote:
Hi All,

We are hitting one race due to which try_to_grab_pending() is stuck .

What kernel version are you using?

5.10

5.10.0 was released a very long time ago.  Please use a more modern
kernel release :)

It would not be feasible for us to switch to latest kernel and I think, 
this issue could be there in recent kernel as well.

Sorry, for the formatting mess.

In following scenario, while running (p1)dev_coredumpm() devcd device is
added to
the framework and uevent notification sent to userspace that result in the
call to (p2) devcd_data_write()
which eventually try to delete the queued timer which in the racy scenario
timer is not queued yet.
So, debug object report some warning and in the meantime timer is
initialized and queued from p1 path.
and from p2 path it gets overriden again timer->entry.pprev=NULL and
try_to_grab_pending() stuck
	p1 					p2(X)

    dev_coredump() uevent sent to userspace
    device_add()  =========================> userspace process X reads the uevents
                                             writes to devcd fd which
                                             results into writes to

                                             devcd_data_write()
					      mod_delayed_work()
                                                 try_to_grab_pending()
						  del_timer()
						   debug_assert_init()
   INIT_DELAYED_WORK
   schedule_delayed_work
						    debug_object_fixup()

Why do you have object debugging enabled?

We have enabled object debugging to catch more issues around kernel.

 That's going to take a LONG
time, and will find bugs in your code.  Perhaps like this one? 

What type of device is this?  What bus?  What driver?

remoteproc client device driver would call dev_coredumpm() and devcd 
device gets added as part of the call.

And if you turn object debugging off, what happens?

We have not observed issue after disabling object debugging off.

Regards,
Mukesh

thanks,

greg k-h