On Wed, Aug 19, 2020 at 03:33:20PM +0200, Sebastian Andrzej Siewior wrote: > On 2020-08-19 15:15:07 [+0200], peterz@xxxxxxxxxxxxx wrote: > If you want to optimize further, we could move PF_IO_WORKER to an lower > bit. x86 can test for both via > (gcc-10) > | testl $536870944, 44(%rbp) #, _11->flags > | jne .L1635 #, > > (clang-9) > | testl $536870944, 44(%rbx) # imm = 0x20000020 > | je .LBB112_6 > > > but ARM can't and does > | ldr r1, [r5, #16] @ tsk_3->flags, tsk_3->flags > | mov r2, #32 @ tmp157, > | movt r2, 8192 @ tmp157, > | tst r2, r1 @ tmp157, tsk_3->flags > | beq .L998 @, > > same ARM64 > | ldr w0, [x20, 60] //, _11->flags > | and w0, w0, 1073741792 // tmp117, _11->flags, > | and w0, w0, -536870849 // tmp117, tmp117, > | cbnz w0, .L453 // tmp117, > > using 0x10 for PF_IO_WORKER instead will turn this into: > | ldr w0, [x20, 60] //, _11->flags > | tst w0, 48 // _11->flags, > | bne .L453 //, > > ARM: > | ldr r2, [r5, #16] @ tsk_3->flags, tsk_3->flags > | tst r2, #48 @ tsk_3->flags, > | beq .L998 @, Good point, AFAICT there's a number of low bits still open (and we can shuffle if we have to), so sure put a patch in to that effect while you're at it.