Oliver, Alan,
I understand that there's a disagreement in what is allowed or not with
the anchor API. Effectively that means that I have to assume that driver
programmers will go either way. I have to admit that my view was that a
driver must proactively make sure it doesn't submit further URBs to an
anchor as long as usb_kill_anchored_urbs() runs, through completers or
otherwise. I formed the current patch accordingly.
To make things trickier, a driver might rely (correctly or not) on that
usb_kill_urb() makes sure that resubmission of a URB by the completer,
while usb_kill_urb() is killing it, will fail. Or at least so says the
description of this function.
And once again, the resubmitted URB will remain untouched if the said
race condition occurs. So a driver's programmer, who relied on
usb_kill_urb() to prevent the resubmission, might get the impression
that he did correctly when testing the driver, but then the kernel panic
will happen rarely and far from the eye.
Writing an additional API without this problem is beyond the scope of
this discussion. I'm focused on resolving the problem of the current
one. The existing API must be safe to use, even if it's planned to phase
out.
Given the discussion so far, I realized that the resubmission by
completer case must be handled properly as well. So I suggest modifying
the patch to something like
do {
spin_lock_irq(&anchor->lock);
while (!list_empty(&anchor->urb_list)) {
/* URB kill loop */
}
spin_unlock_irq(&anchor->lock);
} while (unlikely(!usb_anchor_check_wakeup(anchor)));
The do-while loop will almost never make any difference. But it will
loop like a waiting spinlock in the rare event of the said race
condition, while the completer callback executes.
And if the completer submitted a URB, it will be removed as well this
way. Recall that this loops only in the event of a race condition, so it
will NOT play cat-and-mouse with the completer callback, but rather
finish this up rather quickly.
And I've dropped the WARN(): If some people consider resubmission of a
URB to be OK, even while usb_kill_anchored_urbs() is called, no noise
should be made if that causes a rare but tricky situation.
And since I'm at it, I'll make the same change to
usb_poison_anchored_urbs(), which suffers from exactly the same problem.
What do you think?
Thanks,
Eli
On 28/07/20 00:29, Oliver Neukum wrote:
Am Montag, den 27.07.2020, 10:43 -0400 schrieb Alan Stern:
On Mon, Jul 27, 2020 at 03:58:05PM +0200, Oliver Neukum wrote:
Am Montag, den 27.07.2020, 14:27 +0300 schrieb Eli Billauer:
Hello, Oliver.
On 27/07/20 13:14, Oliver Neukum wrote:
That however is really a kludge we cannot have in usbcore.
I am afraid as is the patch should_not_ be applied.
Could you please explain further why the suggested patch is unsuitable?
Hi,
certainly.
1. timeouts are generally a bad idea, especially if the timeout does
not come out of a spec.
2. That involves quoting you:
Alternatively, if the driver submits URBs to the same anchor while
usb_kill_anchored_urbs() is called, this timeout might be reached. This
That would be a bug in the driver, though. In such a situation, a WARN
is worth having.
Well, it is an inherent race, certainly. You can do it, though. It is
debatable whether it would ever make sense. Yet it is not a bug in the
sense of, for example, writing beyond the end of a buffer or submitting
an URB twofold.
could happen, for example, if the completer function that ran in the
racy situation resubmits the URB. If that situation isn't cleared within
1000ms, it means that there's a URB in the system that the driver isn't
aware of. Maybe that situation is worth more than a WARN.
That is an entirely valid use case. And a bulk URB may take a potentially
unbounded time to complete.
It is _not_ a valid use case. Since usb_kill_anchored_urbs() doesn't'
specify whether it will kill URBs that are added to the anchor after it
is called (and before it returns), a driver that anchors URBs at such a
time is buggy.
Yes, if you depend on it. Here we are getting into technicalities.
The thing is that we are getting into areas where we should not need to
go if the API were optimal.
What drivers really want is a way to say, kill this group of URBs and
make sure they stay dead no matter what.
Maybe this should be mentioned in the kerneldoc for the routine: Drivers
must not add URBs to the anchor while the routine is running.
True, yet this defeats one of the aims of the API.
My failure in this case is simply overengineering.
If this line:
usb_unanchor_urb(urb);
In __usb_hcd_giveback_urb(struct urb *urb) weren't there, the issue
would not exist. I misdesigned the API in automatically unanchoring
a completing URB.
Simply removing it now is no longer possible, so we need to come up with
a more complex solution.
Given that this timeout-based API is already present and being used in a
separate context, I don't see anything wrong with using it here as well.
It is unnecessary and results in a much less useful API.
The true error in its design is that it unconditionally unanchors the
URBs it gives back. Stop doing that and it becomes much better.
Regards
Oliver