Re: [PATCH 1/2] sched: Bring the PF_IO_WORKER and PF_WQ_WORKER bits closer together

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Aug 19, 2020 at 09:55:05PM +0200, Sebastian Andrzej Siewior wrote:
> The bits PF_IO_WORKER and PF_WQ_WORKER are tested together in
> sched_submit_work() which is considered to be a hot path.
> If the two bits cross the 8 or 16 bit boundary then most architecture
> require multiple load instructions in order to create the constant
> value. Also, such a value can not be encoded within the compare opcode.
> 
> By moving the bit definition within the same block, the compiler can
> create/use one immediate value.
> 
> For some reason gcc-10 on ARM64 requires both bits to be next to each
> other in order to issue "tst reg, val; bne label". Otherwise the result
> is "mov reg1, val; tst reg, reg1; bne label".
> 
> Move PF_VCPU out of the way so that PF_IO_WORKER can be next to
> PF_WQ_WORKER.
> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>
> ---
> 
> Could someone from the ARM64 camp please verify if this a gcc "bug" or
> opcode/arch limitation? With PF_IO_WORKER as 1 (without the PF_VCPU
> swap) I get for ARM:
> 
> | tst     r2, #33 @ task_flags,
> | beq     .L998           @,
> 
> however ARM64 does here:
> | mov     w0, 33  // tmp117,
> | tst     w19, w0 // task_flags, tmp117
> | bne     .L453           //,
> 
> the extra mov operation. Moving PF_IO_WORKER next to PF_WQ_WORKER as
> this patch gives me:
> | tst     w19, 48 // task_flags,
> | bne     .L453           //,

Moving an immediate into a register really shouldn't be a performance
issue, so I don't think this is a problem. However, the reason GCC does
this is because of the slightly weird way in which immediates are encoded,
meaning that '33' can't be packed into the 'tst' alias. You can try to
decipher the "DecodeBitMasks()" pseudocode in the Arm ARM if you're
interested.

Will



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux