Hi Jiri, On Tue, 2024-03-05 at 08:19 +0100, Jiri Slaby wrote: > EXTERNAL EMAIL: Do not click links or open attachments unless you > know the content is safe > > On 05. 03. 24, 5:15, Rengarajan.S@xxxxxxxxxxxxx wrote: > > Hi Jiri, > > > > On Mon, 2024-03-04 at 07:19 +0100, Jiri Slaby wrote: > > > [Some people who received this message don't often get email from > > > jirislaby@xxxxxxxxxx. Learn why this is important at > > > https://aka.ms/LearnAboutSenderIdentification ;] > > > > > > EXTERNAL EMAIL: Do not click links or open attachments unless you > > > know the content is safe > > > > > > On 04. 03. 24, 5:37, Rengarajan.S@xxxxxxxxxxxxx wrote: > > > > Hi Jiri, > > > > > > > > On Fri, 2024-02-23 at 10:26 +0100, Jiri Slaby wrote: > > > > > EXTERNAL EMAIL: Do not click links or open attachments unless > > > > > you > > > > > know the content is safe > > > > > > > > > > On 23. 02. 24, 10:21, Rengarajan.S@xxxxxxxxxxxxx wrote: > > > > > > On Fri, 2024-02-23 at 07:08 +0100, Jiri Slaby wrote: > > > > > > > EXTERNAL EMAIL: Do not click links or open attachments > > > > > > > unless > > > > > > > you > > > > > > > know the content is safe > > > > > > > > > > > > > > On 22. 02. 24, 14:49, Rengarajan S wrote: > > > > > > > > Updated the TX Burst implementation by changing the > > > > > > > > circular > > > > > > > > buffer > > > > > > > > processing with the pre-existing APIs in kernel. Also > > > > > > > > updated > > > > > > > > conditional > > > > > > > > statements and alignment issues for better readability. > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > so why are you keeping the nested double loop? > > > > > > > > > > > > > > > > > > > Hi, in order to differentiate Burst mode handling with byte > > > > > > mode > > > > > > had > > > > > > seperate loops for both. Since, having single while loop > > > > > > also > > > > > > does > > > > > > not > > > > > > align with rx implementation (where we have seperate > > > > > > handling > > > > > > for > > > > > > burst > > > > > > and byte) have retained the double loop. > > > > > > > > > > So obviously, align RX to a single loop if possible. The > > > > > current > > > > > TX > > > > > code > > > > > is very hard to follow and sort of unmaintainable (and > > > > > buggy). > > > > > And > > > > > IMO > > > > > it's unnecessary as I proposed [1]. And even if RX cannot be > > > > > one > > > > > loop, > > > > > you still can make TX easy to read as the two need not be the > > > > > same. > > > > > > > > > > [1] > > > > > https://lore.kernel.org/all/b8325c3f-bf5b-4c55-8dce-ef395edce251@xxxxxxxxxx/ > > > > > > > > > > > > while (data_empty_count) { > > > > cnt = CIRC_CNT_TO_END(); > > > > if (!cnt) > > > > break; > > > > if (cnt < UART_BURST_SIZE || (tail & 3)) { // > > > > is_unaligned() > > > > writeb(); > > > > cnt = 1; > > > > } else { > > > > writel() > > > > cnt = UART_BURST_SIZE; > > > > } > > > > uart_xmit_advance(cnt); > > > > data_empty_count -= cnt; > > > > } > > > > > > > > With the above implementation we are observing performance drop > > > > of > > > > 2 > > > > Mbps at baud rate of 4 Mbps. The reason for this is the fact > > > > that > > > > for > > > > each iteration we are checking if the the data need to be > > > > processed > > > > via > > > > DWORDs or Bytes. The condition check for each iteration is > > > > causing > > > > the > > > > drop in performance. > > > > > > Hi, > > > > > > the check is by several orders of magnitude faster than the I/O > > > proper. > > > So I don't think that's the root cause. > > > > > > > With the previous implementation(with nested loops) the > > > > performance > > > > is > > > > found to be around 4 Mbps at baud rate of 4 Mbps. In that > > > > implementation we handle sending DWORDs continuosly until the > > > > transfer > > > > size < 4. Can you let us know any other alternatives for the > > > > above > > > > performance drop. > > > > > > Could you attach the patch you are testing? > > > > Please find the updated pci1xxxx_process_write_data > > > > u32 xfer_cnt; > > > > while (*valid_byte_count) { > > xfer_cnt = CIRC_CNT_TO_END(xmit->head, xmit->tail, > > UART_XMIT_SIZE); > > > > if (!xfer_cnt) > > break; > > > > if (xfer_cnt < UART_BURST_SIZE || (xmit->tail & > > 3)) { > > Hi, > > OK, is it different if you remove the alignment checking (which > should > be correct™ thing to do, but may/will slow down things on platforms > which don't care)? After removing alignment checking the performance increases marginally, Transferred 10 MB at 2759999 CPS. But still observing it is less than the previous implementation. > > > writeb(xmit->buf[xmit->tail], port- > > >membase + > > UART_TX_BYTE_FIFO); > > xfer_cnt = UART_BYTE_SIZE; > > } else { > > writel(*(u32 *)&xmit->buf[xmit->tail], > > If you remove the "tail & 3" check, you can use get_unaligned() here > and > need not care about unaligned accesses after all... Using get_unaligned((u32 *) xmit) shows the performance drop to Transferred 10 MB at 1959999 CPS. > > > port->membase + > > UART_TX_BURST_FIFO); > > xfer_cnt = UART_BURST_SIZE; > > } > > > > uart_xmit_advance(port, xfer_cnt); > > *data_empty_count -= xfer_cnt; > > *valid_byte_count -= xfer_cnt; > > } > > > > Testing is done via minicom by transferring a 10 MB file at 4 Mbps, > > > > After the minicom transfer with single instance: > > > > Previous implementation(Nested While Loops): > > Transferred 10 MB at 3900000 CPS > > > > Current implementation: > > Transferred 10 MB at 2459999 CPS > > > > -- > js > suse labs >