Re: Kernel Benchmarking

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello everyone,

On 14/09/2020 19:47, Linus Torvalds wrote:
Michael et al,
  Ok, I redid my failed "hybrid mode" patch from scratch (original
patch never sent out, I never got it to a working point).

Having learnt from my mistake, this time instead of trying to mix the
old and the new code, instead I just extended the new code, and wrote
a _lot_ of comments about it.

I also made it configurable, using a "page_lock_unfairness" knob,
which this patch defaults to 1000 (which is basically infinite).
That's just a value that says how many times we'll try the old unfair
case, so "1000" means "we'll re-queue up to a thousand times before we
say enough is enough" and zero is the fair mode that shows the
performance problems.

Thank you for the new patch and all the work around from everybody!

Sorry to jump in this thread but I wanted to share my issue, also linked to the same commit:

    2a9127fcf229 ("mm: rewrite wait_on_page_bit_common() logic")

I have a simple test environment[1] using Docker and virtme[2] almost with the default kernel config and validating some tests for the MPTCP Upstream project[3]. Some of these tests are using a modified version of packetdrill[4].

Recently, some of these packetdrill tests have been failing after 2 minutes (timeout) instead of being executed in a few seconds (~6 seconds). No packets are even exchanged during these two minutes.

I did a git bisect and it also pointed me to 2a9127fcf229.

I can run the same test 10 times without any issue with the parent commit (v5.8 tag) but with 2a9127fcf229, I have a timeout most of the time.

Of course, when I try to add some debug info on the userspace or kernelspace side, I can no longer reproduce the timeout issue. But without debug, it is easy for me to validate if the issue is there or not. My issue doesn't seem to be linked to a small file that needs to be read multiple of times on a FS. Only a few bytes should be transferred with packetdrill but when there is a timeout, it is even before that because I don't see any transferred packets in case of issue. I don't think a lot of IO is used by Packetdrill before transferring a few packets to a "tun" interface but I didn't analyse further.

With your new patch and the default value, I no longer have the issue.

I've only (lightly) tested those two extremes, I think the interesting
range is likely in the 1-5 range.

So you can do

     echo 0 > /proc/sys/vm/page_lock_unfairness
     .. run test ..

and you should get the same numbers as without this patch (within
noise, of course).

On my side, I have the issue with 0. So it seems good because expected!

Or do

     echo 5 > /proc/sys/vm/page_lock_unfairness
     .. run test ..

and get numbers for "we accept some unfairness, but if we have to
requeue more than five times, we force the fair mode".

Already with 1, it is fine on my side: no more timeout! Same with 5. I am not checking the performances but only the fact I can run packetdrill without timeout. With 1 and 5, tests finish in a normal time, that's really good. I didn't have any timeout in 10 runs, each of them started from a fresh VM. Patch tested with success!

I would be glad to help by validating new modifications or providing new info. My setup is also easy to put in place: a Docker image is built with all required tools to start the same VM just like the one I have. All scripts are on a public repository[1].

Please tell me if I can help!

Cheers,
Matt

[1] https://github.com/multipath-tcp/mptcp_net-next/blob/scripts/ci/virtme.sh and https://github.com/multipath-tcp/mptcp_net-next/blob/scripts/ci/Dockerfile.virtme.sh
[2] https://git.kernel.org/pub/scm/utils/kernel/virtme/virtme.git
[3] https://github.com/multipath-tcp/mptcp_net-next/wiki
[4] https://github.com/multipath-tcp/packetdrill
--
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux