Re: [PATCH v1 tty] 8250: microchip: pci1xxxx: Refactor TX Burst code to use pre-existing APIs

<Rengarajan.S@xxxxxxxxxxxxx> · Wed, 6 Mar 2024 06:55:44 +0000

Hi Jiri,

On Tue, 2024-03-05 at 08:19 +0100, Jiri Slaby wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you
> know the content is safe
> 
> On 05. 03. 24, 5:15, Rengarajan.S@xxxxxxxxxxxxx wrote:
> > Hi Jiri,
> > 
> > On Mon, 2024-03-04 at 07:19 +0100, Jiri Slaby wrote:
> > > [Some people who received this message don't often get email from
> > > jirislaby@xxxxxxxxxx. Learn why this is important at
> > > https://aka.ms/LearnAboutSenderIdentification ;]
> > > 
> > > EXTERNAL EMAIL: Do not click links or open attachments unless you
> > > know the content is safe
> > > 
> > > On 04. 03. 24, 5:37, Rengarajan.S@xxxxxxxxxxxxx wrote:
> > > > Hi Jiri,
> > > > 
> > > > On Fri, 2024-02-23 at 10:26 +0100, Jiri Slaby wrote:
> > > > > EXTERNAL EMAIL: Do not click links or open attachments unless
> > > > > you
> > > > > know the content is safe
> > > > > 
> > > > > On 23. 02. 24, 10:21, Rengarajan.S@xxxxxxxxxxxxx wrote:
> > > > > > On Fri, 2024-02-23 at 07:08 +0100, Jiri Slaby wrote:
> > > > > > > EXTERNAL EMAIL: Do not click links or open attachments
> > > > > > > unless
> > > > > > > you
> > > > > > > know the content is safe
> > > > > > > 
> > > > > > > On 22. 02. 24, 14:49, Rengarajan S wrote:
> > > > > > > > Updated the TX Burst implementation by changing the
> > > > > > > > circular
> > > > > > > > buffer
> > > > > > > > processing with the pre-existing APIs in kernel. Also
> > > > > > > > updated
> > > > > > > > conditional
> > > > > > > > statements and alignment issues for better readability.
> > > > > > > 
> > > > > > > Hi,
> > > > > > > 
> > > > > > > so why are you keeping the nested double loop?
> > > > > > > 
> > > > > > 
> > > > > > Hi, in order to differentiate Burst mode handling with byte
> > > > > > mode
> > > > > > had
> > > > > > seperate loops for both. Since, having single while loop
> > > > > > also
> > > > > > does
> > > > > > not
> > > > > > align with rx implementation (where we have seperate
> > > > > > handling
> > > > > > for
> > > > > > burst
> > > > > > and byte) have retained the double loop.
> > > > > 
> > > > > So obviously, align RX to a single loop if possible. The
> > > > > current
> > > > > TX
> > > > > code
> > > > > is very hard to follow and sort of unmaintainable (and
> > > > > buggy).
> > > > > And
> > > > > IMO
> > > > > it's unnecessary as I proposed [1]. And even if RX cannot be
> > > > > one
> > > > > loop,
> > > > > you still can make TX easy to read as the two need not be the
> > > > > same.
> > > > > 
> > > > > [1]
> > > > > https://lore.kernel.org/all/b8325c3f-bf5b-4c55-8dce-ef395edce251@xxxxxxxxxx/
> > > > 
> > > > 
> > > > while (data_empty_count) {
> > > >      cnt = CIRC_CNT_TO_END();
> > > >      if (!cnt)
> > > >        break;
> > > >      if (cnt < UART_BURST_SIZE || (tail & 3)) { //
> > > > is_unaligned()
> > > >        writeb();
> > > >        cnt = 1;
> > > >      } else {
> > > >        writel()
> > > >        cnt = UART_BURST_SIZE;
> > > >      }
> > > >      uart_xmit_advance(cnt);
> > > >      data_empty_count -= cnt;
> > > > }
> > > > 
> > > > With the above implementation we are observing performance drop
> > > > of
> > > > 2
> > > > Mbps at baud rate of 4 Mbps. The reason for this is the fact
> > > > that
> > > > for
> > > > each iteration we are checking if the the data need to be
> > > > processed
> > > > via
> > > > DWORDs or Bytes. The condition check for each iteration is
> > > > causing
> > > > the
> > > > drop in performance.
> > > 
> > > Hi,
> > > 
> > > the check is by several orders of magnitude faster than the I/O
> > > proper.
> > > So I don't think that's the root cause.
> > > 
> > > > With the previous implementation(with nested loops) the
> > > > performance
> > > > is
> > > > found to be around 4 Mbps at baud rate of 4 Mbps. In that
> > > > implementation we handle sending DWORDs continuosly until the
> > > > transfer
> > > > size < 4. Can you let us know any other alternatives for the
> > > > above
> > > > performance drop.
> > > 
> > > Could you attach the patch you are testing?
> > 
> > Please find the updated pci1xxxx_process_write_data
> > 
> >       u32 xfer_cnt;
> > 
> >          while (*valid_byte_count) {
> >                  xfer_cnt = CIRC_CNT_TO_END(xmit->head, xmit->tail,
> >                                             UART_XMIT_SIZE);
> > 
> >                  if (!xfer_cnt)
> >                          break;
> > 
> >                  if (xfer_cnt < UART_BURST_SIZE || (xmit->tail &
> > 3)) {
> 
> Hi,
> 
> OK, is it different if you remove the alignment checking (which
> should
> be correct™ thing to do, but may/will slow down things on platforms
> which don't care)?

After removing alignment checking the performance increases marginally,
Transferred 10 MB at 2759999 CPS. But still observing it is less than
the previous implementation.

> 
> >                          writeb(xmit->buf[xmit->tail], port-
> > >membase +
> >                                 UART_TX_BYTE_FIFO);
> >                          xfer_cnt = UART_BYTE_SIZE;
> >                  } else {
> >                          writel(*(u32 *)&xmit->buf[xmit->tail],
> 
> If you remove the "tail & 3" check, you can use get_unaligned() here
> and
> need not care about unaligned accesses after all...

Using get_unaligned((u32 *) xmit) shows the performance drop to
Transferred 10 MB at 1959999 CPS.

> 
> >                                 port->membase +
> > UART_TX_BURST_FIFO);
> >                          xfer_cnt = UART_BURST_SIZE;
> >                  }
> > 
> >                  uart_xmit_advance(port, xfer_cnt);
> >                  *data_empty_count -= xfer_cnt;
> >                  *valid_byte_count -= xfer_cnt;
> >          }
> > 
> > Testing is done via minicom by transferring a 10 MB file at 4 Mbps,
> > 
> > After the minicom transfer with single instance:
> > 
> > Previous implementation(Nested While Loops):
> > Transferred 10 MB at 3900000 CPS
> > 
> > Current implementation:
> > Transferred 10 MB at 2459999 CPS
> 
> 
> 
> --
> js
> suse labs
>