On 3/21/24 1:45 AM, Niklas Cassel wrote: > Hello Kuppuswamy, > > On Wed, Mar 20, 2024 at 08:53:12AM -0700, Kuppuswamy Sathyanarayanan wrote: >> Hi, >> >> On 3/20/24 2:01 AM, Niklas Cassel wrote: >>> The current code uses writel()/readl(), which has an implicit memory >>> barrier for every single readl()/writel(). >>> >>> Additionally, reading 4 bytes at a time over the PCI bus is not really >>> optimal, considering that this code is running in an ioctl handler. >>> >>> Use memcpy_toio()/memcpy_fromio() for BAR tests. >>> >>> Before patch with a 4MB BAR: >>> $ time /usr/bin/pcitest -b 1 >>> BAR1: OKAY >>> real 0m 1.56s >>> >>> After patch with a 4MB BAR: >>> $ time /usr/bin/pcitest -b 1 >>> BAR1: OKAY >>> real 0m 0.54s >>> >>> Signed-off-by: Niklas Cassel <cassel@xxxxxxxxxx> >>> --- >>> Changes since v2: >>> -Actually free the allocated memory... (thank you Kuppuswamy) >>> >>> drivers/misc/pci_endpoint_test.c | 68 ++++++++++++++++++++++++++------ >>> 1 file changed, 55 insertions(+), 13 deletions(-) >>> >>> diff --git a/drivers/misc/pci_endpoint_test.c b/drivers/misc/pci_endpoint_test.c >>> index 705029ad8eb5..1d361589fb61 100644 >>> --- a/drivers/misc/pci_endpoint_test.c >>> +++ b/drivers/misc/pci_endpoint_test.c >>> @@ -272,33 +272,75 @@ static const u32 bar_test_pattern[] = { >>> 0xA5A5A5A5, >>> }; >>> >>> +static int pci_endpoint_test_bar_memcmp(struct pci_endpoint_test *test, >>> + enum pci_barno barno, int offset, >>> + void *write_buf, void *read_buf, >>> + int size) >>> +{ >>> + memset(write_buf, bar_test_pattern[barno], size); >>> + memcpy_toio(test->bar[barno] + offset, write_buf, size); >>> + >>> + memcpy_fromio(read_buf, test->bar[barno] + offset, size); >>> + >>> + return memcmp(write_buf, read_buf, size); >>> +} >>> + >>> static bool pci_endpoint_test_bar(struct pci_endpoint_test *test, >>> enum pci_barno barno) >>> { >>> - int j; >>> - u32 val; >>> - int size; >>> + int j, bar_size, buf_size, iters, remain; >>> + void *write_buf; >>> + void *read_buf; >>> struct pci_dev *pdev = test->pdev; >>> + bool ret; >>> >>> if (!test->bar[barno]) >>> return false; >>> >>> - size = pci_resource_len(pdev, barno); >>> + bar_size = pci_resource_len(pdev, barno); >>> >>> if (barno == test->test_reg_bar) >>> - size = 0x4; >>> + bar_size = 0x4; >>> >>> - for (j = 0; j < size; j += 4) >>> - pci_endpoint_test_bar_writel(test, barno, j, >>> - bar_test_pattern[barno]); >>> + buf_size = min(SZ_1M, bar_size); >> Why 1MB limit? > Could you please clarify your concern? Since you are trying to optimize the number of read/write calls, I was just wondering why you chose maximum limit of 1MB per read/write call. But your following explanation makes sense to me. I recommend adding some comments about it in commit log or code. Code wise, your change looks fine to me. Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx> > > A BAR could be several GB, so it does not make sense to always kmalloc() > a buffer that is of the same size of the BAR. > (Therefore we copy in to a smaller buffer, iterating over the whole BAR.) > > So we have to chose a max limit that we think is likely to succeed even > when the memory is fragmented, and something that will work on embedded > systems, etc. > > The highest BAR size used by pci-epf-test is by default 1MB, so 1MB > seemed like a reasonable max limit. (Since we use min(), if the BAR is > smaller than 1MB, the buffer we allocate will also be smaller than 1MB. > > Since we allocate two buffers, we are in the worst case allocating 2x 1MB, > so I don't think that it is reasonable to have a higher max limit. > > If you are using a _very_ resource contained system as RC (and EP) to test > the pci-epf-test driver, you have probably reduced the default BAR sizes > defined in pci-epf-test to something smaller already, so 1MB seemed like > a reasonable max limit. > > > Kind regards, > Niklas > >>> >>> - for (j = 0; j < size; j += 4) { >>> - val = pci_endpoint_test_bar_readl(test, barno, j); >>> - if (val != bar_test_pattern[barno]) >>> - return false; >>> + write_buf = kmalloc(buf_size, GFP_KERNEL); >>> + if (!write_buf) >>> + return false; >>> + >>> + read_buf = kmalloc(buf_size, GFP_KERNEL); >>> + if (!read_buf) { >>> + ret = false; >>> + goto err; >>> } >>> >>> - return true; >>> + iters = bar_size / buf_size; >>> + for (j = 0; j < iters; j++) { >>> + if (pci_endpoint_test_bar_memcmp(test, barno, buf_size * j, >>> + write_buf, read_buf, >>> + buf_size)) { >>> + ret = false; >>> + goto err; >>> + } >>> + } >>> + >>> + remain = bar_size % buf_size; >>> + if (remain) { >>> + if (pci_endpoint_test_bar_memcmp(test, barno, buf_size * iters, >>> + write_buf, read_buf, >>> + remain)) { >>> + ret = false; >>> + goto err; >>> + } >>> + } >>> + >>> + ret = true; >>> + >>> +err: >>> + kfree(write_buf); >>> + kfree(read_buf); >>> + >>> + return ret; >>> } >>> >>> static bool pci_endpoint_test_intx_irq(struct pci_endpoint_test *test) >> -- >> Sathyanarayanan Kuppuswamy >> Linux Kernel Developer >> -- Sathyanarayanan Kuppuswamy Linux Kernel Developer