On Wed, May 06, 2020 at 08:28:03AM +0100, Will Deacon wrote: > On Wed, May 06, 2020 at 09:24:25AM +0800, Ming Lei wrote: > > On Tue, May 05, 2020 at 05:46:18PM +0200, Christoph Hellwig wrote: > > > On Thu, Apr 30, 2020 at 10:02:54PM +0800, Ming Lei wrote: > > > > BLK_MQ_S_INACTIVE is only set when the last cpu of this hctx is becoming > > > > offline, and blk_mq_hctx_notify_offline() is called from cpu hotplug > > > > handler. So if there is any request of this hctx submitted from somewhere, > > > > it has to this last cpu. That is done by blk-mq's queue mapping. > > > > > > > > In case of direct issue, basically blk_mq_get_driver_tag() is run after > > > > the request is allocated, that is why I mentioned the chance of > > > > migration is very small. > > > > > > "very small" does not cut it, it has to be zero. And it seems the > > > new version still has this hack. > > > > But smp_mb() is used for ordering the WRITE and READ, so it is correct. > > > > barrier() is enough when process migration doesn't happen. > > Without numbers I would just make the smp_mb() unconditional. Your > questionable optimisation trades that for a load of the CPU ID and a > conditional branch, which isn't obviously faster to me. It's also very The CPU ID is just percpu READ, and unlikely() has been used for optimizing the conditional branch. And smp_mb() could cause CPU stall, I guess, so it should be much slower than reading CPU ID. Let's see the attached microbench[1], the result shows that smp_mb() is 10+ times slower than smp_processor_id() with one conditional branch. [ 1.239951] test_foo: smp_mb 738701907 smp_id 62904315 result 0 overflow 5120 The micronbench is run on simple 8cores KVM guest, and cpu is 'Model name: Intel(R) Core(TM) i7-8665U CPU @ 1.90GHz'. Result is pretty stable in my 5 runs of VM boot. > difficult to explain to people and relies on a bunch of implicit behaviour > (e.g. racing only with CPU-affine hotplug notifier). It can be documented easily. > > If it turns out that the smp_mb() is worthwhile, then I'd suggest improving > the comment, perhaps to include the litmus test I cooked previously. I have added big comment on this usage in V10 already. [1] miscrobench diff --git a/block/blk-mq.c b/block/blk-mq.c index 956106b01810..548eec11f922 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -3836,8 +3836,47 @@ unsigned int blk_mq_rq_cpu(struct request *rq) } EXPORT_SYMBOL(blk_mq_rq_cpu); +static unsigned long test_smp_mb(unsigned long cnt) +{ + unsigned long start = local_clock(); + + while (cnt--) + smp_mb(); + + return local_clock() - start; +} + +static unsigned long test_smp_id(unsigned long cnt, short *result, int *overflow) +{ + unsigned long start = local_clock(); + + while (cnt--) { + short cpu = smp_processor_id(); + *result += cpu; + if (unlikely(*result == 0)) + (*overflow)++; + } + return local_clock() - start; +} + +static void test_foo(void) +{ + const unsigned long cnt = 10 << 24; + short result = 0; + int overflow = 0; + unsigned long v1, v2; + + v1 = test_smp_mb(cnt); + v2 = test_smp_id(cnt, &result, &overflow); + + printk("%s: smp_mb %lu smp_id %lu result %d overflow %d\n", + __func__, v1, v2, (int)result, overflow); +} + static int __init blk_mq_init(void) { + test_foo(); + cpuhp_setup_state_multi(CPUHP_BLK_MQ_DEAD, "block/mq:dead", NULL, blk_mq_hctx_notify_dead); cpuhp_setup_state_multi(CPUHP_AP_BLK_MQ_ONLINE, "block/mq:online", Thanks, Ming