On Wed, Nov 11, 2020 at 2:38 AM kernel test robot <lkp@xxxxxxxxx> wrote: > > Hi Magnus, > > I love your patch! Perhaps something to improve: > > [auto build test WARNING on bpf-next/master] > > url: https://github.com/0day-ci/linux/commits/Magnus-Karlsson/xsk-i40e-Tx-performance-improvements/20201110-190310 > base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master > config: powerpc64-randconfig-r025-20201110 (attached as .config) > compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 4d81c8adb6ed9840257f6cb6b93f60856d422a15) > reproduce (this is a W=1 build): > wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross > chmod +x ~/bin/make.cross > # install powerpc64 cross compiling tool for clang build > # apt-get install binutils-powerpc64-linux-gnu > # https://github.com/0day-ci/linux/commit/b016bbeac6692a93e61b28efa430d64645032b5e > git remote add linux-review https://github.com/0day-ci/linux > git fetch --no-tags linux-review Magnus-Karlsson/xsk-i40e-Tx-performance-improvements/20201110-190310 > git checkout b016bbeac6692a93e61b28efa430d64645032b5e > # save the attached .config to linux build tree > COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=powerpc64 > > If you fix the issue, kindly add following tag as appropriate > Reported-by: kernel test robot <lkp@xxxxxxxxx> > > All warnings (new ones prefixed by >>): > > >> drivers/net/ethernet/intel/i40e/i40e_xsk.c:417:13: warning: unknown pragma ignored [-Wunknown-pragmas] > #pragma GCC unroll 4 > ^ > 1 warning generated. And I was hoping that unknown pragmas would be ignored, but that will obviously not be the case with -Wunknown-pragmas added. The unrolling of this inner loop where the code spends most of its time gives me nearly 1 Mpps extra in performance which is substantial, so I would like to get this unrolled in some way, but without the warning. Need some advice please. Here are some options that comes in mind: #1: Suppress unknown pragma warnings in this file only by adding CFLAGS_i40e_xsk.o += -Wno-unknown-pragmas (or whatever that option might be) in the Makefile #2: Force the compiler to loop-unroll the loop with for example a switch statement with four cases that all fall through. This will make the code less readable. #3: Manually loop-unroll the loop. This will make the code even less readable than #2. I prefer #1 as I like to keep the code readable, but you might have other better suggestions on how to tackle this. Thanks: Magnus > vim +417 drivers/net/ethernet/intel/i40e/i40e_xsk.c > > 408 > 409 static void i40e_xmit_pkt_batch(struct i40e_ring *xdp_ring, struct xdp_desc *desc, > 410 unsigned int *total_bytes) > 411 { > 412 u16 ntu = xdp_ring->next_to_use; > 413 struct i40e_tx_desc *tx_desc; > 414 dma_addr_t dma; > 415 u32 i; > 416 > > 417 #pragma GCC unroll 4 > 418 for (i = 0; i < PKTS_PER_BATCH; i++) { > 419 dma = xsk_buff_raw_get_dma(xdp_ring->xsk_pool, desc[i].addr); > 420 xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_pool, dma, desc[i].len); > 421 > 422 tx_desc = I40E_TX_DESC(xdp_ring, ntu++); > 423 tx_desc->buffer_addr = cpu_to_le64(dma); > 424 tx_desc->cmd_type_offset_bsz = build_ctob(I40E_TX_DESC_CMD_ICRC | > 425 I40E_TX_DESC_CMD_EOP, > 426 0, desc[i].len, 0); > 427 > 428 *total_bytes += desc[i].len; > 429 } > 430 > 431 xdp_ring->next_to_use = ntu; > 432 } > 433 > > --- > 0-DAY CI Kernel Test Service, Intel Corporation > https://lists.01.org/hyperkitty/list/kbuild-all@xxxxxxxxxxxx