On Fri, Aug 28, 2020 at 6:04 PM Artem Bityutskiy <dedekind1@xxxxxxxxx> wrote: > > On Thu, 2020-08-27 at 22:25 +0530, Subhashini Rao Beerisetty wrote: > > I have an application which finds the data rate over the PCIe > > interface. I’m getting the lesser data rate in one of my Linux X86 > > systems. > > Some more description, may be? Do you have a PCIe device reading one > RAM buffer and then writing to another RAM buffer? Or does it generate > dome data and writes them to a RAM buffer? Presumably it uses DMA? How > much is the CPU involved into the process? Are we talking about > transferring few kilobytes or gigabytes? Thanks a lot for your help and reply. Regarding hardware setup, Xilinx PCIe FPGA endpoint is connected to HOST CPU via PCIe bus. Xilinx PCIe FPGA endpoint has the DMA_REF block and it provides a mechanism to DMA transfer data at the maximum rate between host CPU memory and a FIFO in the DMA-REF block. The host software sets up some data in it’s memory, it then transfers the data to the DMA-REF’s FIFO and then reads it back into a different location in its host memory. This is repeated in a loop. There is a register in the DMA-REF block that gives an indication of transfer speed. > > > When I change the scaling_governor from "powersave" to "performance" > > mode for each CPU, then there is slight improvement in the PCIe data > > rate. > > Definitely this makes your CPU(s) run at max speed, but depending on > platform and settings, this may also affect C-states. Are the CPU(s) > generally idle while you measure, or busy (involved into the test)? You > could run 'turbostat' while measuring the bandwidth, to get some CPU > statistics (e.g., do C-states happen during the PCI test, how busy are > the CPUs). > > > Parallely I started profiling the workload with perf. Whenever I start > > running the profile command “perf stat -a -d -p <PID>” surprisingly > > the application resulted in excellent data rate over PCIe, but when I > > kill the perf command again PCIe data rate drops. I am really confused > > about this behavior.Any clues from this behaviour? > > Well, one possible reason that comes to mind - you get rid of C-states > when you rung perf, and this increases the PCI bandwidth. You can just > try disabling C-states (there are sysfs knobs) and check it out. > Turbostat could be useful to check for this (with and without perf, run > 'turbostat sleep 10' or something like this (measure for 10 seconds in > this example), do this while running your PCI test. Disabling the C-states improved the throughput a lot, thanks a lot for pointing this out. Could you please give some more explanation on how disabling C-states improved the throughput? As you suggested I collected and attached the turbostat log with and without perf while running the PCIe test. In my system, only 'performance\powersave' are listed in scaling_available_governors. Rest other governors "userspace\ondemand\schedutil" are not listed in available_goverors. What might be the reason for this? > > But I am really just guessing here, I do not know enough about your > test and the system (e.g., "a Linux x86" system can be so many things, > like Intel or AMD server or a mobile device)… It's an Intel Atom processor. > >
turbostat version 17.06.23 - Len Brown <lenb@xxxxxxxxxx> CPUID(0): GenuineIntel 11 CPUID levels; family:model:stepping 0x6:37:9 (6:55:9) CPUID(1): SSE3 MONITOR - EIST TM2 TSC MSR ACPI-TM TM CPUID(6): APERF, No-TURBO, DTS, No-PTM, No-HWP, No-HWPnotify, No-HWPwindow, No-HWPepp, No-HWPpkg, EPB cpu2: MSR_IA32_MISC_ENABLE: 0x00850089 (TCC EIST No-MWAIT PREFETCH TURBO) CPUID(7): No-SGX SLM BCLK: 83.3 Mhz cpu2: MSR_CC6_DEMOTION_POLICY_CONFIG: 0x00000000 (DISable-CC6-Demotion) cpu2: MSR_MC6_DEMOTION_POLICY_CONFIG: 0x00000000 (DISable-MC6-Demotion) RAPL: 4581 sec. Joule Counter Range, at 30 Watts cpu2: MSR_PLATFORM_INFO: 0x60000001700 6 * 83.3 = 499.8 MHz max efficiency frequency 23 * 83.3 = 1915.9 MHz base frequency cpu2: MSR_IA32_POWER_CTL: 0x00000000 (C1E auto-promotion: DISabled) cpu2: MSR_ATOM_CORE_RATIOS: 0x00170602 2 * 83.3 = 166.6 MHz minimum operating frequency 6 * 83.3 = 499.8 MHz low frequency mode (LFM) 23 * 83.3 = 1915.9 MHz base frequency cpu2: MSR_ATOM_CORE_TURBO_RATIOS: 0x17171717 23 * 83.3 = 1915.9 MHz max turbo 4 active cores 23 * 83.3 = 1915.9 MHz max turbo 3 active cores 23 * 83.3 = 1915.9 MHz max turbo 2 active cores 23 * 83.3 = 1915.9 MHz max turbo 1 active core cpu2: MSR_PKG_CST_CONFIG_CONTROL: 0x0017000f (UNlocked: pkg-cstate-limit=15: pc7) cpu2: POLL: CPUIDLE CORE POLL IDLE cpu2: C1: MWAIT 0x00 cpu2: C6N: MWAIT 0x58 cpu2: C6S: MWAIT 0x52 cpu2: cpufreq driver: intel_pstate cpu2: cpufreq governor: performance cpufreq intel_pstate no_turbo: 1 cpu0: MSR_IA32_ENERGY_PERF_BIAS: 0x00000006 (balanced) cpu0: MSR_RAPL_POWER_UNIT: 0x00000505 (0.031250 Watts, 0.000032 Joules, 0.000977 sec.) cpu0: MSR_PKG_POWER_LIMIT: 0x003880fa (UNlocked) cpu0: PKG Limit #1: ENabled (7.812500 Watts, 262144.000000 sec, clamp DISabled) cpu0: PKG Limit #2: DISabled (0.000000 Watts, 0.000977* sec, clamp DISabled) cpu0: MSR_PP0_POWER_LIMIT: 0x00020000 (UNlocked) cpu0: Cores Limit: DISabled (0.000000 Watts, 0.001953 sec, clamp DISabled) cpu0: MSR_IA32_TEMPERATURE_TARGET: 0x006e0000 (110 C) 40.004417 sec Core CPU Avg_MHz Busy% Bzy_MHz TSC_MHz IRQ SMI C1 C6N C6S C1% C6N% C6S% CPU%c1 CPU%c6 Mod%c6 CoreTmp GFX%rc6 Pkg%pc6 PkgWatt CorWatt - - 748 39.33 1901 1917 458560 0 1566280 1889 7775 43.07 0.29 18.19 42.23 17.85 13.56 29 100.00 11.80 1.14 0.95 0 0 643 33.89 1896 1917 33647 0 405339 492 1950 48.44 0.31 18.28 47.57 17.69 13.32 28 100.00 11.80 1.14 0.95 1 1 1064 55.78 1906 1917 358521 0 449875 25 1910 27.37 0.03 17.83 26.41 17.34 13.32 29 2 2 647 34.14 1895 1917 33821 0 359155 517 1940 48.21 0.29 18.20 47.43 17.86 13.81 28 3 3 638 33.52 1902 1917 32571 0 351911 855 1975 48.27 0.54 18.47 47.51 18.51 13.81 28
turbostat version 17.06.23 - Len Brown <lenb@xxxxxxxxxx> CPUID(0): GenuineIntel 11 CPUID levels; family:model:stepping 0x6:37:9 (6:55:9) CPUID(1): SSE3 MONITOR - EIST TM2 TSC MSR ACPI-TM TM CPUID(6): APERF, No-TURBO, DTS, No-PTM, No-HWP, No-HWPnotify, No-HWPwindow, No-HWPepp, No-HWPpkg, EPB cpu2: MSR_IA32_MISC_ENABLE: 0x00850089 (TCC EIST No-MWAIT PREFETCH TURBO) CPUID(7): No-SGX SLM BCLK: 83.3 Mhz cpu2: MSR_CC6_DEMOTION_POLICY_CONFIG: 0x00000000 (DISable-CC6-Demotion) cpu2: MSR_MC6_DEMOTION_POLICY_CONFIG: 0x00000000 (DISable-MC6-Demotion) RAPL: 4581 sec. Joule Counter Range, at 30 Watts cpu2: MSR_PLATFORM_INFO: 0x60000001700 6 * 83.3 = 499.8 MHz max efficiency frequency 23 * 83.3 = 1915.9 MHz base frequency cpu2: MSR_IA32_POWER_CTL: 0x00000000 (C1E auto-promotion: DISabled) cpu2: MSR_ATOM_CORE_RATIOS: 0x00170602 2 * 83.3 = 166.6 MHz minimum operating frequency 6 * 83.3 = 499.8 MHz low frequency mode (LFM) 23 * 83.3 = 1915.9 MHz base frequency cpu2: MSR_ATOM_CORE_TURBO_RATIOS: 0x17171717 23 * 83.3 = 1915.9 MHz max turbo 4 active cores 23 * 83.3 = 1915.9 MHz max turbo 3 active cores 23 * 83.3 = 1915.9 MHz max turbo 2 active cores 23 * 83.3 = 1915.9 MHz max turbo 1 active core cpu2: MSR_PKG_CST_CONFIG_CONTROL: 0x0017000f (UNlocked: pkg-cstate-limit=15: pc7) cpu2: POLL: CPUIDLE CORE POLL IDLE cpu2: C1: MWAIT 0x00 cpu2: C6N: MWAIT 0x58 cpu2: C6S: MWAIT 0x52 cpu2: cpufreq driver: intel_pstate cpu2: cpufreq governor: performance cpufreq intel_pstate no_turbo: 1 cpu0: MSR_IA32_ENERGY_PERF_BIAS: 0x00000006 (balanced) cpu0: MSR_RAPL_POWER_UNIT: 0x00000505 (0.031250 Watts, 0.000032 Joules, 0.000977 sec.) cpu0: MSR_PKG_POWER_LIMIT: 0x003880fa (UNlocked) cpu0: PKG Limit #1: ENabled (7.812500 Watts, 262144.000000 sec, clamp DISabled) cpu0: PKG Limit #2: DISabled (0.000000 Watts, 0.000977* sec, clamp DISabled) cpu0: MSR_PP0_POWER_LIMIT: 0x00020000 (UNlocked) cpu0: Cores Limit: DISabled (0.000000 Watts, 0.001953 sec, clamp DISabled) cpu0: MSR_IA32_TEMPERATURE_TARGET: 0x006e0000 (110 C) 40.003807 sec Core CPU Avg_MHz Busy% Bzy_MHz TSC_MHz IRQ SMI C1 C6N C6S C1% C6N% C6S% CPU%c1 CPU%c6 Mod%c6 CoreTmp GFX%rc6 Pkg%pc6 PkgWatt CorWatt - - 522 27.23 1915 1917 285074 0 784132 86336 168235 31.17 13.41 28.89 30.75 26.97 3.75 30 100.00 3.60 1.12 0.94 0 0 412 21.50 1915 1917 11229 0 155840 13229 69746 24.68 7.95 46.57 24.35 35.28 3.80 29 100.00 3.60 1.12 0.94 1 1 764 39.91 1914 1917 242649 0 269925 25179 32418 26.95 13.44 20.52 26.38 19.26 3.80 29 2 2 476 24.82 1915 1917 16549 0 224377 27839 1269 48.85 21.52 5.44 48.36 19.04 3.70 30 3 3 435 22.70 1915 1917 14647 0 133990 20089 64802 24.19 10.74 43.05 23.90 34.28 3.70 30
_______________________________________________ Kernelnewbies mailing list Kernelnewbies@xxxxxxxxxxxxxxxxx https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies