Hi Petr,
On 1/16/2025 4:48 PM, Petr Mladek wrote:
On Thu 2025-01-16 13:03:16, laokz wrote:
Hi Petr,
Thanks for the quick reply.
On 1/15/2025 11:57 PM, Petr Mladek wrote:
On Wed 2025-01-15 08:32:12, laokz@xxxxxxxxxxx wrote:
When do livepatch transition, kernel call klp_try_complete_transition() which in-turn might call klp_send_signals(). klp_send_signal() has the code:
if (klp_signals_cnt == SIGNALS_TIMEOUT)
pr_notice("signaling remaining tasks\n");
Do we need to match or filter out this message when check_result?
And here klp_signals_cnt MUST EQUAL to SIGNALS_TIMEOUT, right?
Oops, I misunderstood the 2nd question: (klp_signals_cnt % SIGNALS_TIMEOUT
== 0) not always mean equal.
Good question. Have you seen this message when running the selftests, please?
I wonder which test could trigger it. I do not recall any test
livepatch where the transition might get blocked for too long.
There is the self test with a blocked transition ("busy target
module") but the waiting is stopped much earlier there.
The message probably might get printed when the selftests are
called on a huge and very busy system. But then we might get
into troubles also with other timeouts. So it would be nice
to know more details about when this happens.
We're trying to port livepatch to RISC-V. In my qemu virt VM in a cloud
environment, all tests passed except test-syscall.sh. Mostly it complained
the missed dmesg "signaling remaining tasks". I want to confirm from your
experts that in theory the failure is expected, or if we could filter out
this potential dmesg completely.
The test-syscall.sh test spawns many processes which are calling the
SYS_getpid syscall in a busy loop. I could imagine that it might
cause problems when the virt VM emulates much more virtual CPUs than
the assigned real CPUs. It might be even worse when the RISC-V
processor is just emulated on another architecture.
Anyway, we have already limited the max number of processes because
they overflow the default log buffer size, see the commit
46edf5d7aed54380 ("selftests/livepatch: define max test-syscall
processes").
Does it help to reduce the MAXPROC limit from 128 to 64, 32, or 16?
IMHO, even 16 processes are good enough. We do not need to waste
that many resources by QA.
You might also review the setup of your VM and reduce the number
of emulated CPUs. If the VM is not able to reasonably handle
high load than it might show false positives in many tests.
If nothing helps, fell free to send a patch for filtering the
"signaling remaining tasks" message. IMHO, it is perfectly fine
to hide this message. Just extend the already existing filter in
the "check_result" function.
With your help, I tried decrease MAXPROC, not ok; decrease VM '-smp 8'
to 4, ok, all tests passed all 5 times(MAXPROC not modified). Yes it is
my emulation environment triggered the false positive. If later we faced
the same problem in real machine, we'd try patching the filter.
Thanks a lot
laokz