Re: [PATCH -next] mm: usercopy: add a debugfs interface to bypass the vmalloc check.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 2024/12/3 21:51, Uladzislau Rezki wrote:
On Tue, Dec 03, 2024 at 09:45:09PM +0800, Kefeng Wang wrote:


On 2024/12/3 21:39, Uladzislau Rezki wrote:
On Tue, Dec 03, 2024 at 09:30:09PM +0800, Kefeng Wang wrote:


On 2024/12/3 21:10, zuoze wrote:


在 2024/12/3 20:39, Uladzislau Rezki 写道:
On Tue, Dec 03, 2024 at 07:23:44PM +0800, zuoze wrote:
We have implemented host-guest communication based on the TUN device
using XSK[1]. The hardware is a Kunpeng 920 machine (ARM architecture),
and the operating system is based on the 6.6 LTS version with kernel
version 6.6. The specific stack for hotspot collection is as follows:

-  100.00%     0.00%  vhost-12384  [unknown]      [k] 0000000000000000
      - ret_from_fork
         - 99.99% vhost_task_fn
            - 99.98% 0xffffdc59f619876c
               - 98.99% handle_rx_kick
                  - 98.94% handle_rx
                     - 94.92% tun_recvmsg
                        - 94.76% tun_do_read
                           - 94.62% tun_put_user_xdp_zc
                              - 63.53% __check_object_size
                                 - 63.49% __check_object_size.part.0
                                      find_vmap_area
                              - 30.02% _copy_to_iter
                                   __arch_copy_to_user
                     - 2.27% get_rx_bufs
                        - 2.12% vhost_get_vq_desc
                             1.49% __arch_copy_from_user
                     - 0.89% peek_head_len
                          0.54% xsk_tx_peek_desc
                     - 0.68% vhost_add_used_and_signal_n
                        - 0.53% eventfd_signal
                             eventfd_signal_mask
               - 0.94% handle_tx_kick
                  - 0.94% handle_tx
                     - handle_tx_copy
                        - 0.59% vhost_tx_batch.constprop.0
                             0.52% tun_sendmsg

It can be observed that most of the overhead is concentrated in the
find_vmap_area function.

...

Thank you. Then you have tons of copy_to_iter/copy_from_iter calls
during your test case. Per each you need to find an area which might
be really heavy.

Exactly, no vmalloc check before 0aef499f3172 ("mm/usercopy: Detect vmalloc overruns"), so no burden in find_vmap_area in old kernel.


How many CPUs in a system you have?


128 core




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux