On Tue, Aug 03, 2010 at 10:14:35AM -0500, Johan Hovold wrote: > I would like to ask you Greg not to apply this patch yet as I have some > doubts regarding it's correctness. I'm glad to get your input. > > On Mon, Aug 02, 2010 at 10:46:00AM -0700, David VomLehn wrote: > > Fix USB console hang due to non-atomic URB allocation > > > > This is intended to fix a problem seen with the Amtel sam-ba tool by > > Alexander Stein (alexander.stein@xxxxxxxxxxxxxxxxxxxxx). It appears > > when using an A91 controller. He bisected it to commit: > > > > 8e8dce065088833fc418bfa5fbf035cb0726c04c: USB: use kfifo to buffer usb-generic serial writes > > Firstly, the description is somewhat inaccurate (as David points out > below) as the bisected commit is completely unrelated to the locking > issue David intends to solve (in fact, write handling has been completely > rewritten in 2.6.35, after the commit in question). > > The reason for Sam-ba not working anymore is quite likely to the fact > that the above mentioned commit introduces merges of write requests > (the fifo). I've asked for more info from Alexander, but there are two possibilities I'm investigating. One is timing related, in that the introduction of a FIFO may eliminate pauses between data. Second, the specific symptom that led me to make this fix is that the trailing carriage return on lines was being discarded because the single write URB was busy when it tried to write it. It is possible that Sam-ba, in some way, depends on one or both of these behaviors. I hope I'm wrong because it would mean that I broke something in user space when I corrected a bug. > If the Sam-ba firmware is not happy with that, it should be fixed or a > specific driver which does not use a fifo needs to be written. I have > access to such an Atmel board and could have a look at it soon. I would be very interested to know what you find out. > > This fix was developed because, after Alexander's email, I took a look > > at my USB console and found that it, too, was experiencing a hang. I > > assumed that this hang was the same as the one Alexander is seeing and > > developed this fix. It's a good fix for *a* hang, but after thinking > > about this a bit, I'm not sure this is a fix for *Alexander's* hang. > > diff --git a/drivers/usb/serial/generic.c b/drivers/usb/serial/generic.c > > index a817ced..7085809 100644 > > --- a/drivers/usb/serial/generic.c > > +++ b/drivers/usb/serial/generic.c > > @@ -199,6 +199,9 @@ retry: > > } > > i = (int)find_first_bit(&port->write_urbs_free, > > ARRAY_SIZE(port->write_urbs)); > > + if (i == ARRAY_SIZE(port->write_urbs)) > > + return 0; > > + clear_bit(i, &port->write_urbs_free); > > Secondly, I doubt that this is a real fix for the problem David is > having. AFAIK, the locking is indeed correct as the whole write code is > protected by the USB_SERIAL_WRITE_BUSY flag. The USB_SERIAL_WRITE_BUSY flag protects against re-entry to usb_serial_generic_write_start(), but does not protect any data structures used outside the function. Specifically, it doesn't protect write_urbs_free. Take the following sequence. 1. Usb_serial_generic_write_start() is called to write some data. 2. An URB is allocated. The bit in write_urbs_free is not cleared yet. 3. The URB is queued by calling usb_submit_urb(). 4. Usb_serial_read_bulk_callback() is invoked for the URB before usb_submit_urb() returns. 5. The bit for the URB is set in write_urbs_free. 6. Usb_serial_read_bulk_callback() returns 7. The bit for the same URB is cleared. Since the bit in write_urbs_free indicates that it is in use but since usb_serial_read_write_callback() is only called once per URB, the URB is lost. If this is a possible sequence, my change will fix it. Some history: when I was trying to debug this problem, I saw usb_submit_urb() returning -EINVAL because the hcpriv element of the URB was not zero. This sounded like a double-allocation problem, so I went looking for a synchronization issue. When I found one and my problem went away after it was fixed, I didn't look further but I agree, this doesn't look like a double allocation problem. If this looks like one to you, we have some more work ahead of us. > Thus, after checking that > write_urbs_free is indeed non-zero, no bit can be cleared before write > returns. This means that find_first_bit will never return > ARRAY_SIZE(port->write_urbs). I agree, the return will never happen and this code should go away. > Thirdly, this very piece of code also introduces a locking imbalance as > the port lock is not unlocked before returning. True, though fortunately it would never happen. > David, could you let me know in which set-up you're experiencing the > lock-up so I can try to reproduce it here? Do you have any logs from > running with debugging enabled on 2.6.35? I have a USB 2.0 hub with a CP210x and an RTL8150 attached. I am using the CP210x as my console, which may be a factor. I boot my system, and my USB hangs before it can complete all the initscripts. > Thanks, > Johan -- David VL -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html