On Tue, 2009-10-20 at 03:09 +0900, Linus Torvalds wrote: > > On Mon, 19 Oct 2009, Michael Ellerman wrote: > > > > As Benh said it's not really bisectable on our kernel. But I got Mikey > > to bisect it on upstream using a different simulator model, and he > > couldn't tie it down. It becomes easier to hit in more recent kernels > > (since 27), but he could hit in 25 too. > > Ok, thanks to the verification. > > And I think I see why it got easier to hit lately, and to some degree I > think we can at least partially avoid it: > > > * hvc_console reads all our input and passes it to the tty code > > via tty_insert_flip_char() > > * flush_to_ldisc() runs calling n_tty_receive_buf(), which fills > > 4K of tty->read_buf > > * Once read_buf is full, tty->receive_room becomes 0 and > > flush_to_ldisc() reschedules itself to run again in 1 jiffy. > > * Bash reads 1 character, causing receive_room to become 1. > > * flush_to_ldisc() runs again and inserts 1 more char because > > receive_room is now 1. > > .. ok, I agree that our behavior in the "buffer full" case is likely not > wonderful. And that's especially true in the 'icanon' case.. > > > * (repeat the last two steps a few times) > > * Bash sets tty->icanon = 1 via n_tty_set_termios(), which calls > > n_tty_set_room(). Because icanon is enabled, n_tty_set_room() > > lies and says we have space for 1 character even though we > > don't. > > So just to clarify - icanon wasn't set before? It wasn't set before. > > * flush_to_ldisc() runs, sees that receive_room is 1 and calls > > n_tty_receive_buf() > > * n_tty_receive_buf() calls n_tty_receive_char() which drops the > > character because there's no room (~ line 1132). > > * We keep dropping characters until we see a newline, which > > increments tty->canon_data, causing n_tty_set_room() to report 0 > > space left, and so flush_to_ldisc() reschedules again. > > Did you have newlines in the big bulk dump? Yes. It's not really a bulk dump, it's just a bit over 4K of shell script that we're writing into the console to run a test. So lots of newlines. > If there really aren't any newlines, I don't think we can do a lot. icanon > handling kind of fundamentally doesn't work if there is no newline at all, > since it is all about line buffering, and we obviously have to limit the > lines _somewhere_. Right that makes sense. But shouldn't be the case here. > But I also thihk that we only update that whole 'canon_data' thing if we > _received_ the newlines while we were in icanon mode (but I didn't really > check very closely), so if we actually switch from -icanon to +icanon, I > think canon_data can be confused, and we thus handle the buffer-full case > worse than we _could_ have. Yeah I think that's right. n_tty_set_termios() always sets canon_data to 0 when there is a canon change, in either direction. And I don't see anything else will fix it. > > It's a bit unfortunate that n_tty_set_room() lies about the available > > space when icanon = 1, but it makes sense in order to handle erase. It > > would be nice if n_tty_receive_buf() returned a status to > > flush_to_disc() to say "actually I could only fit X chars after all, > > please take them back" - but I don't grok how that would play with all > > the other logic in there. > > Yeah, I don't think that is even worth it. The thing is, we do have to > start dropping characters at some point, so trying to extend the non-drop > case just moves it somewhere else. If you are in canon mode, and the line > is longer than <some-number-that-happens-to-be-4kB>, you're basically > screwed. Agreed. > But what we _might_ do is to handle the "turn on canon mode" case better, > in case the old buffer had newlines. IOW, instead of always setting > 'canon_data' to 0 when icanon changes (and setting canon_head to the tail > of the data), maybe we should simply _count_ the number of newlines (and > set canon_head to the last one in the buffer). > > IOW, if you do have newlines in your bulk data before icanon got set, we > could probably make n_tty handle that case better. Yeah I think that is the root cause of my bug. There are plenty of lines in my buffer, so n_tty_set_room() should not be accepting more data. > That said, as far as I know, the tty layer behavior in this area has > basically always been the way it is. The fact that you can see it more > easily is almost certainly just related to just how the buffers that lead > up to flush_to_ldisc have grown, and are now allowed to fill up further. > So it probably got much easier to trigger, but it is likely not in any way > a really new case, and goes back not just to 2.6.25, but probably > forever... Yeah, we only stopped at 25 because our toolchain was too new :} > I wonder a bit what in your particular environment makes this problem show > up, but I assume that your simulation environment ends up just blindly > stuffing the tty buffers through the virtual console, so you basically > have "simulated typing" that was (a) started long before the reader was > interested in accepting it and (b) was a huge dump rather than what any > real load would be. Yes and no. (a) doesn't seem to apply, no amount of delay before starting the input makes a difference. And (b) only sort of. I don't think a bit more than 4K is a ridiculous amount to stuff down a console. Though I guess we're "typing" at a ridiculous speed. > But if you change n_tty_set_termios() to count newlines when it sets > icanon, you might get the behaviour you want, and it would seem to me to > be an improvement over what we have now, so I wouldn't object to it, even > if I suspect nobody else has ever cared, and would ever care in the > future. Clearly no one has ever cared before :) I'll have a look if I can come up with a neat fix. cheers
Attachment:
signature.asc
Description: This is a digitally signed message part