On 2020-08-19 15:15:07 [+0200], peterz@xxxxxxxxxxxxx wrote: > > - if (tsk->flags & (PF_WQ_WORKER | PF_IO_WORKER)) { > > + if (tsk->flags & PF_WQ_WORKER) { > > preempt_disable(); > > - if (tsk->flags & PF_WQ_WORKER) > > - wq_worker_sleeping(tsk); > > - else > > - io_wq_worker_sleeping(tsk); > > + wq_worker_sleeping(tsk); > > preempt_enable_no_resched(); > > } > > > > if (tsk_is_pi_blocked(tsk)) > > return; > > > > + if (tsk->flags & PF_IO_WORKER) > > + io_wq_worker_sleeping(tsk); > > + > > Urgh, so this adds a branch in what is normally considered a fairly hot > path. > > I'm thinking that the raw_spinlock_t option would permit leaving that > single: > > if (tsk->flags & (PF_WQ_WORKER | PF_IO_WORKER)) > > branch intact? The compiler generates code to test for both flags at once. If none of both possible flags are set then there is one branch (get out and bring me to tst_is_pi…). And yes, with raw_spinlock_t we could keep that one branch. If you want to optimize further, we could move PF_IO_WORKER to an lower bit. x86 can test for both via (gcc-10) | testl $536870944, 44(%rbp) #, _11->flags | jne .L1635 #, (clang-9) | testl $536870944, 44(%rbx) # imm = 0x20000020 | je .LBB112_6 but ARM can't and does | ldr r1, [r5, #16] @ tsk_3->flags, tsk_3->flags | mov r2, #32 @ tmp157, | movt r2, 8192 @ tmp157, | tst r2, r1 @ tmp157, tsk_3->flags | beq .L998 @, same ARM64 | ldr w0, [x20, 60] //, _11->flags | and w0, w0, 1073741792 // tmp117, _11->flags, | and w0, w0, -536870849 // tmp117, tmp117, | cbnz w0, .L453 // tmp117, using 0x10 for PF_IO_WORKER instead will turn this into: | ldr w0, [x20, 60] //, _11->flags | tst w0, 48 // _11->flags, | bne .L453 //, ARM: | ldr r2, [r5, #16] @ tsk_3->flags, tsk_3->flags | tst r2, #48 @ tsk_3->flags, | beq .L998 @, Sebastian