On Mon, Apr 25, 2016 at 7:03 PM, Tejun Heo <tj@xxxxxxxxxx> wrote: > Hello, Roman. > > On Mon, Apr 25, 2016 at 06:34:45PM +0200, Roman Penyaev wrote: >> I can assure you that smp_mb() helps (at least running for 30 minutes >> under IO). That was my first variant, but I did not like it because I >> could not explain myself why: >> >> 1. not smp_wmb()? We need to do flush after an update. >> (I tried that also, and it does not help) > > Regardless of the success of queue_work(), the interface guarantees > that there will be at least one execution instance which sees whatever > updates the queuer has made prior to calling queue_work(). The > PENDING bit is what synchronizes this operations. > > A B > > Make updates > clear PENDING test_and_set PENDING > start execution > > So, if B's test_and_set takes place before clearing of PENDING, what > should be guaranteed is that A's execution must be able to see B's > updates; however, as there's no barrier between "clear PENDING" and > "start execution", memory loads of execution can be scheduled before > clearing of PENDING which leads to a situation where B loses queueing > but its updates are not seen by the prior instance's execution. It's > a classic "either a sees b (clear PENDING) or b sees a (prior > updates)" interlocking situation. Ok, that's clear now. Thanks. I was confused also by a spin lock, which is being released just after clear pending: set_work_pool_and_clear_pending(work, pool->id); spin_unlock_irq(&pool->lock); ... worker->current_func(work); But seems memory operations of execution can leak-in and appear before pended bit is cleared and spin lock is released. (according to Documentation/memory-barriers.txt, (6) RELEASE operations) >> 2. what protects us from this situation? >> >> CPU#0 CPU#1 >> set_work_data() >> test_and_set_bit() >> smp_mb() > > The above would be completely fine as CPU#1's execution would see all > the changes CPU#0 has made upto that point. > >> And 2. question was crucial to me, because even tiny delay "fixes" the >> problem, e.g. ndelay also "fixes" the bug: >> >> smp_wmb(); >> set_work_data(work, (unsigned long)pool_id << WORK_OFFQ_POOL_SHIFT, 0); >> + ndelay(40); >> } >> >> Why ndelay(40)? Because on this machine smp_mb() takes 40 ns on average. > > Yeah, this is the CPU rescheduling loads for the execution ahead of > clearing of PENDING and doing anything inbetween is likely to reduce > the chance of it happening drastically, but smp_mb() inbetween is > actually the right solution here. Tejun, do you need an updated patch for that? With a proper smp_mb()? -- Roman -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html