On Mon, Jun 29, 2020 at 09:56:49AM -0700, Bart Van Assche wrote: > On 2020-06-29 09:15, Alan Stern wrote: > > Aha. Looking at this more closely, it's apparent that the code in > > blk-core.c contains a logic bug: It assumes that if the BLK_MQ_REQ_PREEMPT > > flag is set then the request can be issued regardless of the queue's > > runtime status. That is not correct when the queue is suspended. > > Please clarify why this is not correct. As I understand it, BLK_MQ_REQ_PREEMPT is supposed to mean (among other things) that this request may be issued as part of the procedure for putting a device into a low-power state or returning it to a high-power state. Consequently, requests with that flag set must be allowed while the queue is in the RPM_SUSPENDING or RPM_RESUMING runtime states -- as opposed to ordinary requests, which are allowed only in the RPM_ACTIVE state. In the RPM_SUSPENDED state, however, the queue is entirely inactive. Even if a request were to be issued somehow, it would fail because the system would not be able to transmit it to the device. In other words, when the queue is in the RPM_SUSPENDED state, a resume must be requested before _any_ request can be issued. > > Index: usb-devel/block/blk-core.c > > =================================================================== > > --- usb-devel.orig/block/blk-core.c > > +++ usb-devel/block/blk-core.c > > @@ -423,7 +423,8 @@ int blk_queue_enter(struct request_queue > > * responsible for ensuring that that counter is > > * globally visible before the queue is unfrozen. > > */ > > - if (pm || !blk_queue_pm_only(q)) { > > + if ((pm && q->rpm_status != RPM_SUSPENDED) || > > + !blk_queue_pm_only(q)) { > > success = true; > > } else { > > percpu_ref_put(&q->q_usage_counter); > > Does the above change make it impossible to bring a suspended device > back to the RPM_ACTIVE state if the BLK_MQ_REQ_NOWAIT flag is set? The only case affected by this change is when BLK_MQ_REQ_PREEMPT is set and the queue is in the RPM_SUSPENDED state. If BLK_MQ_REQ_NOWAIT was also set, the original code would set "success" to true, allowing the request to proceed even though it could not be carried out immediately -- a bug. With the patch, such a request will fail without resuming the device. I don't know whether that is the desired behavior or not, but at least it's not obviously a bug. It does seem odd that blk_queue_enter() tests the queue's pm_only status and the request flag in two different spots (here and below). Why does it do this? It seems like an invitation for bugs. > > @@ -448,8 +449,7 @@ int blk_queue_enter(struct request_queue > > > > wait_event(q->mq_freeze_wq, > > (!q->mq_freeze_depth && > > - (pm || (blk_pm_request_resume(q), > > - !blk_queue_pm_only(q)))) || > > + blk_pm_resume_queue(pm, q)) || > > blk_queue_dying(q)); > > if (blk_queue_dying(q)) > > return -ENODEV; > > Index: usb-devel/block/blk-pm.h > > =================================================================== > > --- usb-devel.orig/block/blk-pm.h > > +++ usb-devel/block/blk-pm.h > > @@ -6,11 +6,14 @@ > > #include <linux/pm_runtime.h> > > > > #ifdef CONFIG_PM > > -static inline void blk_pm_request_resume(struct request_queue *q) > > +static inline int blk_pm_resume_queue(const bool pm, struct request_queue *q) > > { > > - if (q->dev && (q->rpm_status == RPM_SUSPENDED || > > - q->rpm_status == RPM_SUSPENDING)) > > - pm_request_resume(q->dev); > > + if (!q->dev || !blk_queue_pm_only(q)) > > + return 1; /* Nothing to do */ > > + if (pm && q->rpm_status != RPM_SUSPENDED) > > + return 1; /* Request allowed */ > > + pm_request_resume(q->dev); > > + return 0; > > } > > Does the above change, especially the " && q->rpm_status != > RPM_SUSPENDED" part, make it impossible to bring a suspended device back > to the RPM_ACTIVE state? Just the opposite -- the change makes it _possible_ for a BLK_MQ_REQ_PREEMPT request to bring a suspended device back to the RPM_ACTIVE state. Look at the existing code: If pm is true then blk_pm_request_resume() will be skipped, so the device won't be resumed. With this patch -- in particular with the "&& q->rpm_status != RPM_SUSPENDED" part added -- the call won't be skipped and so the resume will take place. The rather complicated syntax of the wait_event() call in the existing code contributes to this confusion. One of the things my patch tries to do is make the code more straightforward and easier to grasp. I admit that there are parts to this thing I don't understand. The wait_event() call in blk_queue_enter(), for example: If we are waiting for the device to leave the RPM_SUSPENDED state (or enter the RPM_ACTIVE state), where does q->mq_freeze_wq get woken up? There's no obvious spot in blk_{pre|post}_runtime_resume(). Alan Stern