hid-sony kernel crash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Benjamin and Jiri,

Last week, Valve notified us of a kernel crash in hid-sony when
disconnecting a controller (DS4 or DS3) while rumble is ongoing. We
think we understand the failure mechanism, but there are a few gaps in
our understanding. We would like to confirm our understanding before
we prepare a patch. The issue is quite urgent for us as it affects not
just desktop Linux, but in particular Android.

Valve originally reported the issue on Ubuntu 18.04 using 4.15 kernel.
They can produce it about 1 in 3 times during Borderlands 2. We
managed to reproduce it ourselves on this kernel, but not on 4.19 or
newer kernels. We believe it got fixed or "hidden" in newer kernels
(will explain more). We suspect fftest can also trigger the issue,
haven't tried that just.

The hid-sony bug is a crash due to a NULL pointer exception in
"dualshock4_send_output_report", which accesses output_report_dmabuf.
The cause is likely a condition between
"dualshock4_send_output_report" (or "sixaxis_send_output_report") and
"sony_remove". The output_report call is used to queue work to the
controller e.g. for rumble or LEDs. It can be called in parallel with
"sony_remove" and what now seems AFTER "sony_remove" finishes (yikes).

The "sony_remove" call cleans up most of the device state, which is in
"struct sony_sc". However the "struct sony_sc" (allocated using
devm_kzalloc) will be around until the device object is finally
removed. We suspect the evdev nodes will also be around for some time
in the future. We are not sure on the timing on how long this takes,
but we suspect this can take sufficiently long for a new
"dualshock4_send_output_report" call to be triggered on a mostly
cleaned up device. Does this sound like a good explanation?

4.19 and newer kernels are not affected by the crash as Hanno moved
allocation of "output_report_dmabuf" to leverage the devm_kmalloc API.
The buffer is around until device destruction, so until after the
evdev nodes are gone.

We have 2 potential fixes in mind, but not sure what is best.

One option is to prevent "sony_schedule_work" from scheduling new
output reports. There are some existing variables for that e.g.
state_worker_initialized, which should be set to "0" e.g. by
"sony_cancel_work_sync". This might be something nice for us to do
anyway, but still leaves a "window".

The second option is maybe calling "input_ff_destroy" to remove FF
capability from "sony_remove". I know the input framework does it for
us as well, but apparently it does it "too late". On a sidenote, other
drivers might need to do the same if they are sensitive to this "time
window" race condition. Thoughts?

Thanks,
Roderick



[Index of Archives]     [Linux Media Devel]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Linux Wireless Networking]     [Linux Omap]

  Powered by Linux