Re: [PATCH v1 tty] 8250: microchip: pci1xxxx: Refactor TX Burst code to use pre-existing APIs

<Rengarajan.S@xxxxxxxxxxxxx> · Tue, 5 Mar 2024 04:15:51 +0000

Hi Jiri,

On Mon, 2024-03-04 at 07:19 +0100, Jiri Slaby wrote:
> [Some people who received this message don't often get email from
> jirislaby@xxxxxxxxxx. Learn why this is important at
> https://aka.ms/LearnAboutSenderIdentification ;]
> 
> EXTERNAL EMAIL: Do not click links or open attachments unless you
> know the content is safe
> 
> On 04. 03. 24, 5:37, Rengarajan.S@xxxxxxxxxxxxx wrote:
> > Hi Jiri,
> > 
> > On Fri, 2024-02-23 at 10:26 +0100, Jiri Slaby wrote:
> > > EXTERNAL EMAIL: Do not click links or open attachments unless you
> > > know the content is safe
> > > 
> > > On 23. 02. 24, 10:21, Rengarajan.S@xxxxxxxxxxxxx wrote:
> > > > On Fri, 2024-02-23 at 07:08 +0100, Jiri Slaby wrote:
> > > > > EXTERNAL EMAIL: Do not click links or open attachments unless
> > > > > you
> > > > > know the content is safe
> > > > > 
> > > > > On 22. 02. 24, 14:49, Rengarajan S wrote:
> > > > > > Updated the TX Burst implementation by changing the
> > > > > > circular
> > > > > > buffer
> > > > > > processing with the pre-existing APIs in kernel. Also
> > > > > > updated
> > > > > > conditional
> > > > > > statements and alignment issues for better readability.
> > > > > 
> > > > > Hi,
> > > > > 
> > > > > so why are you keeping the nested double loop?
> > > > > 
> > > > 
> > > > Hi, in order to differentiate Burst mode handling with byte
> > > > mode
> > > > had
> > > > seperate loops for both. Since, having single while loop also
> > > > does
> > > > not
> > > > align with rx implementation (where we have seperate handling
> > > > for
> > > > burst
> > > > and byte) have retained the double loop.
> > > 
> > > So obviously, align RX to a single loop if possible. The current
> > > TX
> > > code
> > > is very hard to follow and sort of unmaintainable (and buggy).
> > > And
> > > IMO
> > > it's unnecessary as I proposed [1]. And even if RX cannot be one
> > > loop,
> > > you still can make TX easy to read as the two need not be the
> > > same.
> > > 
> > > [1]
> > > https://lore.kernel.org/all/b8325c3f-bf5b-4c55-8dce-ef395edce251@xxxxxxxxxx/
> > 
> > 
> > while (data_empty_count) {
> >     cnt = CIRC_CNT_TO_END();
> >     if (!cnt)
> >       break;
> >     if (cnt < UART_BURST_SIZE || (tail & 3)) { // is_unaligned()
> >       writeb();
> >       cnt = 1;
> >     } else {
> >       writel()
> >       cnt = UART_BURST_SIZE;
> >     }
> >     uart_xmit_advance(cnt);
> >     data_empty_count -= cnt;
> > }
> > 
> > With the above implementation we are observing performance drop of
> > 2
> > Mbps at baud rate of 4 Mbps. The reason for this is the fact that
> > for
> > each iteration we are checking if the the data need to be processed
> > via
> > DWORDs or Bytes. The condition check for each iteration is causing
> > the
> > drop in performance.
> 
> Hi,
> 
> the check is by several orders of magnitude faster than the I/O
> proper.
> So I don't think that's the root cause.
> 
> > With the previous implementation(with nested loops) the performance
> > is
> > found to be around 4 Mbps at baud rate of 4 Mbps. In that
> > implementation we handle sending DWORDs continuosly until the
> > transfer
> > size < 4. Can you let us know any other alternatives for the above
> > performance drop.
> 
> Could you attach the patch you are testing?

Please find the updated pci1xxxx_process_write_data

	u32 xfer_cnt;

        while (*valid_byte_count) {
                xfer_cnt = CIRC_CNT_TO_END(xmit->head, xmit->tail,
                                           UART_XMIT_SIZE);

                if (!xfer_cnt)
                        break;

                if (xfer_cnt < UART_BURST_SIZE || (xmit->tail & 3)) {
                        writeb(xmit->buf[xmit->tail], port->membase +
                               UART_TX_BYTE_FIFO);
                        xfer_cnt = UART_BYTE_SIZE;
                } else {
                        writel(*(u32 *)&xmit->buf[xmit->tail],
                               port->membase + UART_TX_BURST_FIFO);
                        xfer_cnt = UART_BURST_SIZE;
                }

                uart_xmit_advance(port, xfer_cnt);
                *data_empty_count -= xfer_cnt;
                *valid_byte_count -= xfer_cnt;
        }

Testing is done via minicom by transferring a 10 MB file at 4 Mbps,

After the minicom transfer with single instance:

Previous implementation(Nested While Loops):
Transferred 10 MB at 3900000 CPS

Current implementation:
Transferred 10 MB at 2459999 CPS

Thanks,
Rengarajan S

> 
> thanks,
> --
> js
> suse labs
>