On Sat, 2018-08-04 at 18:01 +-0800, Ming Lei wrote: +AD4- On Sat, Aug 4, 2018 at 8:03 AM, Bart Van Assche +ADw-bart.vanassche+AEA-wdc.com+AD4- wrote: +AD4- +AD4- +AD4- +AD4- diff --git a/block/blk-pm.c b/block/blk-pm.c +AD4- +AD4- index 2a4632d0be4b..0708185ead61 100644 +AD4- +AD4- --- a/block/blk-pm.c +AD4- +AD4- +-+-+- b/block/blk-pm.c +AD4- +AD4- +AEAAQA- -107,17 +-107,31 +AEAAQA- int blk+AF8-pre+AF8-runtime+AF8-suspend(struct request+AF8-queue +ACo-q) +AD4- +AD4- if (+ACE-q-+AD4-dev) +AD4- +AD4- return ret+ADs- +AD4- +AD4- +AD4- +AD4- +- WARN+AF8-ON+AF8-ONCE(q-+AD4-rpm+AF8-status +ACEAPQ- RPM+AF8-ACTIVE)+ADs- +AD4- +AD4- +- +AD4- +AD4- blk+AF8-pm+AF8-runtime+AF8-lock(q)+ADs- +AD4- +AD4- +- /+ACo- +AD4- +AD4- +- +ACo- Switch to preempt-only mode before calling synchronize+AF8-rcu() such +AD4- +AD4- +- +ACo- that later blk+AF8-queue+AF8-enter() calls see the preempt-only state. See +AD4- +AD4- +- +ACo- also http://lwn.net/Articles/573497/. +AD4- +AD4- +- +ACo-/ +AD4- +AD4- +- blk+AF8-set+AF8-pm+AF8-only(q)+ADs- +AD4- +AD4- +- ret +AD0- -EBUSY+ADs- +AD4- +AD4- +- if (percpu+AF8-ref+AF8-read(+ACY-q-+AD4-q+AF8-usage+AF8-counter) +AD0APQ- 1) +AHs- +AD4- +AD4- +- synchronize+AF8-rcu()+ADs- +AD4- +AD4- +- if (percpu+AF8-ref+AF8-read(+ACY-q-+AD4-q+AF8-usage+AF8-counter) +AD0APQ- 1) +AD4- +AD4- +- ret +AD0- 0+ADs- +AD4- +AD4- +- +AH0- +AD4- +AD4- The above code looks too tricky, and the following issue might exist in case +AD4- of potential WRITEs re-order from blk+AF8-queue+AF8-enter(). +AD4- +AD4- 1) suppose there are one in-flight request not completed yet +AD4- +AD4- 2) observer is the percpu+AF8-ref+AF8-read() following synchronize+AF8-rcu(). +AD4- +AD4- 3) inside blk+AF8-queue+AF8-enter(), WRITE in percpu+AF8-ref+AF8-tryget+AF8-live()/percpu+AF8-ref+AF8-put() +AD4- can be re-ordered from view of the observer in 2) +AD4- +AD4- 4) then percpu+AF8-ref+AF8-read() may return 1 even though the only in-flight +AD4- request isn't completed +AD4- +AD4- 5) suspend still moves on, and data loss may be triggered +AD4- +AD4- Cc Paul, and Alan have been CCed already, so we have several memory +AD4- model experts here, :-), hope they may help to clarify if this reorder case +AD4- may exist. Hello Ming, There are two cases: (a) The percpu+AF8-ref+AF8-tryget+AF8-live() call in (3) is triggered by a runtime power state transition (RPM+AF8AKg-). (b) The percpu+AF8-ref+AF8-tryget+AF8-live() call in (3) is not triggered by a runtime power state transition. In case (b) the driver that submits the request should protect the request with scsi+AF8-autopm+AF8-get+AF8-device() / scsi+AF8-autopm+AF8-put+AF8-device() or by any other mechanism that prevents runtime power state transitions. Since scsi+AF8-autopm+AF8-get+AF8-device() is called before blk+AF8-queue+AF8-enter(), since scsi+AF8-autopm+AF8-get+AF8-device() only returns after it has resumed a device and since scsi+AF8-autopm+AF8-get+AF8-device() prevents runtime suspend, the function blk+AF8-pre+AF8-runtime+AF8-suspend() won't be called concurrently with blk+AF8-queue+AF8-enter() in case (b). Since the power management serializes power state transitions also for case (a) the race described above won't happen. blk+AF8-pre+AF8-runtime+AF8-suspend() will have finished before any RQF+AF8-PM requests are submitted. Bart.