On Thu, Dec 13, 2018 at 08:14:57AM -0800, Sagi Grimberg wrote: >>>> + if (!test_bit(QUEUE_FLAG_POLL, &q->queue_flags)) >>>> + bio->bi_opf &= ~REQ_HIPRI; >>>> + >>> >>> Maybe we can simply check (q->queue_flags & (1 << QUEUE_FLAG_POLL)) and >>> avoid the extra atomic operation in the host path? >>> >>> Would it make sense? >> >> test_bit is not usually implemented as an atomic operation. >> >> Take a look at e.g. >> >> arch/x86/include/asm/bitops.h:constant_test_bit() > > Ah.. But its still read from volatile argument so still more expensive? I don't think the volatile should make a difference. I actually compiled both versions and the test_bit version generates a movq + testl insted of testb: - movq 120(%rbx), %rdx # MEM[(const long unsigned int *)q_38 + 120B], _135 - testl $524288, %edx #, _135 + testb $8, 122(%rbx) #, q_40->queue_flags But actually generates a larger object: 36966 9470 88 46524 b5bc blk-core.o-opencode 36956 9470 88 46514 b5b2 blk-core.o-test-bit No idea what is going there.