dwc3_set_mode vs. __dwc3_set_mode race

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Thinh,

I've run into a race between dwc3_set_mode and __dwc3_set_mode accessing
dwc->desired_dr_role: It's an incredibly tight race that's hard to hit since
role switch events need to come in just after each other. It's reproducible
with an Apple M1 connected to a device that very quickly switches
roles when shutting down (which happens to be another M1). This sometimes
triggers a device->host->device switch sequence which is fast enough to hit this
race:

CPU A
	dwc3_set_mode(DWC3_GCTL_PRTCAP_HOST) // first role switch event
		spin_lock_irqsave(&dwc->lock, flags);
		dwc->desired_dr_role = mode; // DWC3_GCTL_PRTCAP_HOST
		spin_unlock_irqrestore(&dwc->lock, flags);
		queue_work(system_freezable_wq, &dwc->drd_work); // true, schedules __dwc3_set_mode

CPU B
	__dwc3_set_mode
		// ....
		spin_lock_irqsave(&dwc->lock, flags);
		dwc3_set_prtcap(dwc, dwc->desired_dr_role); // DWC3_GCTL_PRTCAP_HOST
		spin_unlock_irqrestore(&dwc->lock, flags);

CPU A
	dwc3_set_mode(DWC3_GCTL_PRTCAP_DEVICE) // second role switch event
		spin_lock_irqsave(&dwc->lock, flags);
		dwc->desired_dr_role = mode; // DWC3_GCTL_PRTCAP_DEVICE
		spin_unlock_irqrestore(&dwc->lock, flags);

CPU B (continues running __dwc3_set_mode)
	switch (dwc->desired_dr_role) { // DWC3_GCTL_PRTCAP_DEVICE
	case DWC3_GCTL_PRTCAP_HOST:
		// not executed since desired_dr_role is DWC3_GCTL_PRTCAP_DEVICE now
		break;

CPU A (continues running dwc3_set_mode)
	queue_work(system_freezable_wq, &dwc->drd_work); // __dwc3_set_mode is still running

CPU B (continues running __dwc3_set_mode)
	case DWC3_GCTL_PRTCAP_DEVICE:
		// ....
		ret = dwc3_gadget_init(dwc);


We then have DWC3_GCTL.DWC3_GCTL_PRTCAPDIR = DWC3_GCTL_PRTCAP_HOST and
dwc->current_dr_role = DWC3_GCTL_PRTCAP_HOST but initialized the controller in
device mode when calling dwc3_gadget_init. This obviously doesn't work and is
not easy to recover from.

Unfortunately we can't just lock dwc3->mutex since dwc3_set_mode may be called
from an extcon interrupt in atomic context (which is probably the reason for
deferring the mode switch to a workqueue).

Otherwise I can only think of creating a single-threaded work queue and
allocating a new work_struct together with desired_role inside dwc3_set_mode
and putting that onto the queue, i.e. something like:

struct dwc3_set_mode_work {
	struct dwc3 *dwc;
	u32 desired_dr_role;
	struct work_struct work;
};

void dwc3_set_mode(struct dwc3 *dwc, u32 mode)
{
	struct dwc3_set_mode_work *work = kzalloc(sizeof(*work), GFP_ATOMIC);

	INIT_WORK(&work->work, __dwc3_set_mode);
	work->dwc = dwc;
	work->desired_dr_role = mode;
	queue_work(dwc->drd_work_queue, &work->work);
}

That way all role switch events will be executed in order and we can't race
desired_dr_role anymore.
I'm not very happy with that solution but can't think of anything else.

Any thoughts or ideas? I can otherwise prepare a patch.


Thanks,


Sven



[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux