Re: [PATCH v4 8/8] serial: qcom-geni: Rework TX in FIFO mode to fix hangs/lockups

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jun 24, 2024 at 02:15:07PM -0700, Doug Anderson wrote:
> On Mon, Jun 24, 2024 at 5:43 AM Johan Hovold <johan@xxxxxxxxxx> wrote:

> > As I mentioned last week, the slowdown from this is quite noticeable
> > (e.g. 25% slowdown at @115200), but this may be the price we need to pay
> > for correctness, at least temporarily.
> >
> > An alternative might be to switch to using a 16 byte fifo. This should
> > reduce console latency even further, and may be able avoid the idling
> > UART penalty by continuing to use the watermark interrupt for refilling
> > the FIFO.
> 
> I'm a bit confused. Right now we're using (effectively) a 64-byte
> FIFO. The FIFO is 16-words deep and we have 4 bytes per word. ...so
> I'm not sure what you mean by switching to a 16-byte FIFO. Do you mean
> to make less use of the FIFO, or something else?

I meant switching to using one-byte words so that we end up with a
16-byte FIFO where we don't have the issue of adding more data when the
last word is not a full four-byte one.

> Overall the big problem I found in all my testing was that I needed to
> wait for a "command done" before kicking off a new command. When the
> "command done" arrives then the UART has stopped transmitting and
> you've got to suffer an interrupt latency before you can start
> transferring again. Essentially:
> 
> 1. Pick a transfer size.
> 2. You can keep sending bytes / using the FIFO efficiently as long as
> there are still bytes left in the transfer.
> 3. When you get to the end of the transfer, you have to wait for the
> UART to stop, report that it's done, and then suffer an interrupt
> latency to start a new transfer.
> 
> So to be efficient you want to pick a big transfer size but if there's
> any chance that you might not need to transfer that many bytes then
> you need to figure out what to do. If you can handle that properly
> then that's great. If not then we have to make sure we never kick off
> a transfer that we might not finish.

Right. But with a 16 1-byte word FIFO, we may be able to kick of a
really long transfer and just keep it running until it needs to be
kicked again (cf. enabling TX). The console code can easily insert
characters in the FIFO while the transfer is running (and would only
have to wait for 16 characters to drain in the worst case).

Effectively, most of the identified issues would just go away, as
there's basically never any need to cancel anything except at port
shutdown.

> I'd also mention that, as talked about in my response to your other
> patch [1], I'm not seeing a 25% slowdown. I tested both with my simple
> proposal and with this whole series applied and my slowdown is less
> than 2%. I guess there must be something different with your setup?
> Trying to think about what kind of slowdown would be reasonable for my
> patch series at 115200:
> 
> a) We send 64 bytes efficiently, which takes 5.6ms (64 * 1000 / 11520)
> 
> b) We stop transferring and wait for an interrupt.
> 
> c) We start transferring 64 bytes again.
> 
> Let's say that your interrupt latency is 1 ms, which would be really
> terrible. In that case you'll essentially transfer 64 bytes in 6.6ms
> instead of 5.6 ms, right? That would be an 18% hit. Let's imagine
> something more sensible and say that most of the time you can handle
> an interrupt in 100 ms. That would be about a 1.7% slowdown, which
> actually matches what I was seeing. For reference, even an old arm32
> rk3288-veyron device I worked with years ago could usually handle
> interrupts in ~100-200 ms since dwc2 needs you to handle at least one
> (sometimes more) interrupt per USB uFrame (250ms).
> 
> ...so I'm confused about where your 25% number is coming from...

I didn't do an in-depth analysis of the slowdown, but I did rerun the
tests now and I'm still seeing a 22-24% slowdown on x1e80100 with rc5.
This is a new platform so I compared with sc8280xp, which shows similar
numbers even if it's slightly faster to begin with:

					sc8280xp	x1e80100

	rc5 full series			61 s		67 s
	rc5 last patch reverted		50 s		54 s

I have a getty running and cat a 10x dmesg file of 543950 bytes to
/dev/ttyMSM0 from an ssh session (just catting in a serial console gives
similar numbers). 

Johan




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [Linux for Sparc]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux