On Tue, 18 Dec 2018 at 03:21, Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote: > > On Mon, 17 Dec 2018, Andrew Worsley wrote: > > > A sysfs driven USB authorisation change can trigger a usb_set_configuration > > while a hub_event worker thread is running. This can result in a USB device > > being disabled just after it was configured and bringing down all the > > devices and impacting hardware and user processes that were established on > > top of this these interfaces. In some cases the USB disable never completed > > and the whole system hung. > > Can you be more specific about this? Disabling a USB device shouldn't > cause these kinds of problems, regardless of whether or not the device > was just configured. I can't say too much about the hardware - but there was an SPI bus and an i2c bus built on top of USB devices on a bus. It appears the SPI bus shutdown was the item that was prone to problems when being torn down. I understand it would be better if the dependant systems shutdown cleanly, which it did about 75% of the time but then it was rather messy sequence. So the current system is far from ideal. The crux of the issue is the usb_disable_device() (line 1874 of drivers/usb/core/message.c) call in usb_set_configuration() is applied to a configured device and triggers the tear down of all the devices built on that USB device and is very disruptive. > > At my work I had an occasional hang due to this race condition. Roughly 1 > > in 50 boots had the race occurrence and 1 in 4 of those resulted in a hang. > > This patch fixed the problem and I had no problems (spurious disables > > or hangs) in 750+ boots. > > usb_authorize_device, usb_deauthorize_device, and hub_event all acquire > the device mutex. Why should adding another mutex make any difference? I believe the new usb_authorize_mutex prevents the cascade of USB device bus scans, discoveries and probes from colliding with those caused a change in USB bus authorisation. The individual device / hub locks are not across the whole system, only an individual component hub/device so any logic that gives an orderly USB device scanning and probing is undermined. The idea of the mutex is to keep the USB device discovery work when a device is added/removed/powered up is separated from those that are caused by authorisation changes. I don't pretend to understand all the logic and sequencing of the current code which works works faultlessly when these threads are serialised by the mutex (750+ times) in boot ups which build up the system and shutdowns which bring down the system. Likewise if I endlessly run the authorisation off / on by itself it is faultless over hundreds of iterations. > In fact there's an actual disadvantage: Making hub_event acquire a > global mutex will prevent us from handling multiple hubs concurrently. > Although we don't do this now, we might want to in the future. > > Alan Stern This is true - I am not trying to serialise the hub_event() threads - but to prevent the authorisation changes from happening during the hub_event threads. So perhaps there would be a way of allowing multiple concurrent hub_events from happening but only one authorisation thread at a time. Assuming multiple hub_event()s are deemed ok. Something like a lock which allows multiple readers but only a single exclusive writer lock? Andrew