On Fri, Jul 07, 2023 at 03:24:58PM -0400, Joel Fernandes wrote: > > > > On Jul 7, 2023, at 10:56 AM, Waiman Long <longman@xxxxxxxxxx> wrote: > > > > On 7/7/23 10:07, Davidlohr Bueso wrote: > >>> On Thu, 06 Jul 2023, Waiman Long wrote: > >>> > >>> It was found that running the refscale test might sometimes crash the > >>> kernel with the following error: > >>> > >>> [ 8569.952896] BUG: unable to handle page fault for address: ffffffffffffffe8 > >>> [ 8569.952900] #PF: supervisor read access in kernel mode > >>> [ 8569.952902] #PF: error_code(0x0000) - not-present page > >>> [ 8569.952904] PGD c4b048067 P4D c4b049067 PUD c4b04b067 PMD 0 > >>> [ 8569.952910] Oops: 0000 [#1] PREEMPT_RT SMP NOPTI > >>> [ 8569.952916] Hardware name: Dell Inc. PowerEdge R750/0WMWCR, BIOS 1.2.4 05/28/2021 > >>> [ 8569.952917] RIP: 0010:prepare_to_wait_event+0x101/0x190 > >>> : > >>> [ 8569.952940] Call Trace: > >>> [ 8569.952941] <TASK> > >>> [ 8569.952944] ref_scale_reader+0x380/0x4a0 [refscale] > >>> [ 8569.952959] kthread+0x10e/0x130 > >>> [ 8569.952966] ret_from_fork+0x1f/0x30 > >>> [ 8569.952973] </TASK> > >>> > >>> This is likely caused by the fact that init_waitqueue_head() is called > >>> after the ref_scale_reader kthread is created. So the kthread may try > >>> to use the waitqueue head before it is properly initialized. Fix this > >>> by initializing the waitqueue head first before kthread creation. > >>> > >>> Fixes: 653ed64b01dc ("refperf: Add a test to measure performance of read-side synchronization") > >>> Signed-off-by: Waiman Long <longman@xxxxxxxxxx> > >> > >> Strange this wasn't reported sooner. > > > > Red Hat does have a pretty large QE organization that run all sort of tests include this one pretty frequently. The race window is pretty small, but they did hit this once in a while. > > Having worked on this test initially, I am happy to hear that Redhat runs this test! > > Thanks for fixing this issue. > Acked-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx> Applied, thank you! Thanx, Paul