Re: Is this a bug in the current serial8250_rx_chars()?

Peter Hurley <peter@xxxxxxxxxxxxxxxxxx> · Mon, 09 Mar 2015 07:10:24 -0400

On 03/08/2015 10:27 PM, George Spelvin wrote:
> I've been trying to get Prasad's "don't block Rx during emergency Tx"
> feature into a form that Peter would be happy with, and the big
> challenge is finding somewhere to put the received characters without
> taking any locks.
> 
> This has led me into figuring out the whole tty_buffer system,
> and unless someone objects, I'm going to submit a comment patch
> for that because the locking rules are not entirely obvious.
> 
> It's also not clear that "used" and "read" have to be in the tty_buffer
> at all.  The only instances that are ever used are bufhead->tail->used,
> and bufhead->head->read, so they could be moved to the bufhead structure.
> 
> And the sentinel is a bit strange and probably unnecessary.

The tty_buffer is a lockless design because there is only ever a
single producer (the uart driver) and a single consumer (the
flush_to_ldisc() input worker).

The uart driver produces filled (or partially filled) buffers at the
tail and the input worker consumes buffers at the head.

The lockless design requires one buffer in the list at all times,
even if it's empty and consumed; the sentinel guarantees that condition
when the tty is first created.

I suppose you're right about "tail->used" and "head->read" but please
analyze each and every use if you plan to optimize those.

> Anyway, the steady state will typically be two 512-byte (size = 256)
> buffers that are ping-ponged.  So there should always be at least
> 256 bytes of buffer space available without needing to allocate anything.

The flip buffer design was ditched years ago for good reason; what
happens when the input worker has not even run yet on the previous input?

> I'll have to add some features to the tty_buffer interface to support a
> no-allocation interface that reports if space is available and uses it
> if it is.  A bit invasive, but it seems like the only clean way.

You should be able to use the tty_buffer interface without modification
from console write; the only other producer is the irq handler which is
excluded by the console write with the port->lock.

> A more immediate and specific concern I've been having is the temporary
> unlock
> 
>         spin_unlock(&port->lock);
>         tty_flip_buffer_push(&port->state->port);
>         spin_lock(&port->lock);
> 
> at the end of serial8250_rx_chars().  I think there's actually a bug
> there.

Yep, I realize since I initially wrote about that the back in October,
it's been lost in the sands of time (and you weren't in the conversation
at the time), but I thought lock dropping should be removed.

> The unlock/lock itself is not the bug.  I think it's unnecessary and
> could be deleted, but that wouldn't affect the bug: the failure to set
> lsr_saved_flags if --max_count expires.

Yep.

> At that point in the code, up->lsr_saved_flags has been set to zero,
> but if the character on top of the FIFO has a parity or framing error
> when max_count expires, the error will be lost and the character received
> as normal.
> 
> serial8250_rx_chars does return the LSR bits, but serial8250_handle_irq
> doesn't do anything with the returned LSR except decide if it should
> call serial8250_tx_chars.  The other LSR bits aren't saved.  So if the
> max_count expires leaving a character with a parity or framing error on
> top of the FIFO, the error will get lost.
> 
> 
> This could be fixed fairly easily by having serial8250_rx_chars save the
> unprocessed LSR bits before unlocking.
> 
> Another possible fix would be to test --max_count *before*
> re-reading the LSR, so the errors would stay in the FIFO.

This would be my vote.

> (Looking for other users, fsl8250_handle_irq sets lsr_saved_flags based
> on the orig_lsr before serial8250_rx_chars was called.... and I think
> it's wrong.)
> 
> 
> 
> Regarding the need for the unlock in the first place, I'm really
> not seeing it.  tty_flip_buffer_push() is just an alias for
> tty_schedule_flip(), which just fills in port->buf.tail->commit and
> calls schedule_work(port->buf.work).

The unlock is a remnant of the low_latency steering which was
buggy and has been removed. I left it there because the -RT patches
don't schedule the input worker but run it directly, and I didn't
want to break that when there was no harm in leaving it.

> So not only is nothing done immediately, but even the eventual
> call to flush_to_ldisc() doesn't take port->lock.
> 
> 
> Although it wastes time, I don't think the unlock/lock
> introduces any bugs.
> 
> A serial8250_console_write could sneak in during the unlock, but
> only the transmit bits matter, and the consle_write code is careful
> to leave them both set, which the hardware is allowed to do
> asynchronously anyway, so worst case is the interrupt handler
> will be immediately re-invoked after it returns.

Well, dropping the unlock allows normal tx to occur so that could
break up the console output.

Regards,
Peter Hurley

--
To unsubscribe from this list: send the line "unsubscribe linux-serial" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html