On 20-11-11 14:47:10, Alan Stern wrote: > On Tue, Nov 10, 2020 at 10:56:50AM -0500, Alan Stern wrote: > > Felipe: > > > > On Mon, Nov 09, 2020 at 02:29:42PM -0500, Kyungtae Kim wrote: > > > We report a bug (in linux-5.8.13) found by FuzzUSB (a modified version > > > of syzkaller). > > > > > > (corrected analysis) > > > This bug happens while continuing a delayed setup message in mass > > > storage gadget. > > > To be specific, composite_setup() sets FSG_STATE_CONFIG_CHANGE via > > > fsg_set_alt() (line 1793), > > > and followed by cdev->delayed_status++ (line 1798). > > > Meanwile, the mass gadget tries check cdev->delayed_status == 0 > > > through handle_exception() (line 2428), > > > which occurs in between the two operations above. > > > Such a race causes invalid operations eventually. > > > > Do you know who maintains composite.c (or the composite framework) these > > days? This is a real race, and it needs to be fixed. > > > > Part of the problem seems to be that cdev->delayed_status is sometimes > > accessed without the protection of cdev->lock. But I don't know when it > > is safe to take that lock, so I can't tell what changes to make. > > > > Another part of the problem is that cdev->delayed_status doesn't count > > things properly. That is, its value is incremented each time a function > > driver asks for a delayed status and decremented each time a function > > driver calls usb_composite_setup_continue(), and the delayed status > > response is sent when the value reaches 0. But there's nothing to stop > > this from happening (imagine a gadget with two functions A and B): > > > > Function driver A asks for delayed status; > > Function driver A calls setup_continue(): Now the value > > of the counter is 0 so a status message is queued > > too early; > > Function driver B asks for delayed status; > > Function driver B calls setup_continue(): Now a second > > status message is queued. > > > > I'm willing to help fix these issues, but I need assistance from someone > > who fully understands the composite framework. > > Okay, so as Peter pointed out, these comments were wrong. The cdev lock > is held and the scenario described above can't occur, because the lock > isn't released until after both functions have asked for delayed status. > > This means that Kyungtae's analysis of the FuzzUSB report is wrong, and > we still don't know what really happened. Here's my guess... > > There's still a possible race, although it's a different one: The > gadget's delayed status reply can race with the host timing out and > sending a new SETUP packet: > > Host sends SETUP packet A > > Function receives A and decides > to send a delayed status reply > > Function thread starts to > process packet A > > Host times out waiting for A status > > Host sends new SETUP packet B > > Composite core receives packet B > and resets cdev->delayed_status resets cdev->delayed_status? Where to do that? Even the host re-try the control transfer, it should send the same control request, eg, SET_CONFIGURATION, and increase cdev->delayed_status. There is a description for possible host sends next control request before receiving status packet at USB 2.0 Spec, CH 5.5.5 Control Transfer Data Sequences If a Setup transaction is received by an endpoint before a previously initiated control transfer is completed, the device must abort the current transfer/operation and handle the new control Setup transaction. A Setup transaction should not normally be sent before the completion of a previous control transfer. However, if a transfer is aborted, for example, due to errors on the bus, the host can send the next Setup transaction prematurely from the endpoint’s perspective. > > Function thread finishes and calls > usb_composite_setup_continue() > > The composite core sends a status > reply for packet A, not packet B > > Host receives status for A but thinks > it is the status for B! > > Function thread processes packet B > > Function thread finishes and calls > usb_composite_setup_continue() > > The composite core sees > cdev->delayed_status == 0 and WARNs. > > At the moment I don't see how to prevent this sort of race from > happening. We may need to change the API, giving the composite core a > way to match up calls to usb_composite_setup_continue() with the > corresponding call to composite_setup(). But even that wouldn't fix > the entire problem. > Hi Alan, I more think a possible reset or disconnect occurrence between them, and composite_disconnect is called. Kyungtae, would you please help test below code? Besides, would you tell me the test environments during FuzzUSB test? A really host or a host emulated host? Thanks. diff --git a/drivers/usb/gadget/composite.c b/drivers/usb/gadget/composite.c index c6d455f2bb92..4a8cc1277206 100644 --- a/drivers/usb/gadget/composite.c +++ b/drivers/usb/gadget/composite.c @@ -2454,7 +2454,8 @@ void usb_composite_setup_continue(struct usb_composite_dev *cdev) spin_lock_irqsave(&cdev->lock, flags); if (cdev->delayed_status == 0) { - WARN(cdev, "%s: Unexpected call\n", __func__); + WARN(cdev, "%s: Unexpected call, %s\n", __func__, + cdev->config ? "connecting state" : "already disconnected"); } else if (--cdev->delayed_status == 0) { DBG(cdev, "%s: Completing delayed status\n", __func__); -- Thanks, Peter Chen