Hi Jiri, On Mon, 2024-03-04 at 07:19 +0100, Jiri Slaby wrote: > [Some people who received this message don't often get email from > jirislaby@xxxxxxxxxx. Learn why this is important at > https://aka.ms/LearnAboutSenderIdentification ;] > > EXTERNAL EMAIL: Do not click links or open attachments unless you > know the content is safe > > On 04. 03. 24, 5:37, Rengarajan.S@xxxxxxxxxxxxx wrote: > > Hi Jiri, > > > > On Fri, 2024-02-23 at 10:26 +0100, Jiri Slaby wrote: > > > EXTERNAL EMAIL: Do not click links or open attachments unless you > > > know the content is safe > > > > > > On 23. 02. 24, 10:21, Rengarajan.S@xxxxxxxxxxxxx wrote: > > > > On Fri, 2024-02-23 at 07:08 +0100, Jiri Slaby wrote: > > > > > EXTERNAL EMAIL: Do not click links or open attachments unless > > > > > you > > > > > know the content is safe > > > > > > > > > > On 22. 02. 24, 14:49, Rengarajan S wrote: > > > > > > Updated the TX Burst implementation by changing the > > > > > > circular > > > > > > buffer > > > > > > processing with the pre-existing APIs in kernel. Also > > > > > > updated > > > > > > conditional > > > > > > statements and alignment issues for better readability. > > > > > > > > > > Hi, > > > > > > > > > > so why are you keeping the nested double loop? > > > > > > > > > > > > > Hi, in order to differentiate Burst mode handling with byte > > > > mode > > > > had > > > > seperate loops for both. Since, having single while loop also > > > > does > > > > not > > > > align with rx implementation (where we have seperate handling > > > > for > > > > burst > > > > and byte) have retained the double loop. > > > > > > So obviously, align RX to a single loop if possible. The current > > > TX > > > code > > > is very hard to follow and sort of unmaintainable (and buggy). > > > And > > > IMO > > > it's unnecessary as I proposed [1]. And even if RX cannot be one > > > loop, > > > you still can make TX easy to read as the two need not be the > > > same. > > > > > > [1] > > > https://lore.kernel.org/all/b8325c3f-bf5b-4c55-8dce-ef395edce251@xxxxxxxxxx/ > > > > > > while (data_empty_count) { > > cnt = CIRC_CNT_TO_END(); > > if (!cnt) > > break; > > if (cnt < UART_BURST_SIZE || (tail & 3)) { // is_unaligned() > > writeb(); > > cnt = 1; > > } else { > > writel() > > cnt = UART_BURST_SIZE; > > } > > uart_xmit_advance(cnt); > > data_empty_count -= cnt; > > } > > > > With the above implementation we are observing performance drop of > > 2 > > Mbps at baud rate of 4 Mbps. The reason for this is the fact that > > for > > each iteration we are checking if the the data need to be processed > > via > > DWORDs or Bytes. The condition check for each iteration is causing > > the > > drop in performance. > > Hi, > > the check is by several orders of magnitude faster than the I/O > proper. > So I don't think that's the root cause. > > > With the previous implementation(with nested loops) the performance > > is > > found to be around 4 Mbps at baud rate of 4 Mbps. In that > > implementation we handle sending DWORDs continuosly until the > > transfer > > size < 4. Can you let us know any other alternatives for the above > > performance drop. > > Could you attach the patch you are testing? Please find the updated pci1xxxx_process_write_data u32 xfer_cnt; while (*valid_byte_count) { xfer_cnt = CIRC_CNT_TO_END(xmit->head, xmit->tail, UART_XMIT_SIZE); if (!xfer_cnt) break; if (xfer_cnt < UART_BURST_SIZE || (xmit->tail & 3)) { writeb(xmit->buf[xmit->tail], port->membase + UART_TX_BYTE_FIFO); xfer_cnt = UART_BYTE_SIZE; } else { writel(*(u32 *)&xmit->buf[xmit->tail], port->membase + UART_TX_BURST_FIFO); xfer_cnt = UART_BURST_SIZE; } uart_xmit_advance(port, xfer_cnt); *data_empty_count -= xfer_cnt; *valid_byte_count -= xfer_cnt; } Testing is done via minicom by transferring a 10 MB file at 4 Mbps, After the minicom transfer with single instance: Previous implementation(Nested While Loops): Transferred 10 MB at 3900000 CPS Current implementation: Transferred 10 MB at 2459999 CPS Thanks, Rengarajan S > > thanks, > -- > js > suse labs >