Re: [PATCH] rbd: rework rbd_request_fn()

Ilya Dryomov <ilya.dryomov@xxxxxxxxxxx> · Wed, 6 Aug 2014 23:06:09 +0400

On Wed, Aug 6, 2014 at 10:32 PM, Alex Elder <elder@xxxxxxxx> wrote:
> On 08/05/2014 02:38 AM, Ilya Dryomov wrote:
>> While it was never a good idea to sleep in request_fn(), commit
>> 34c6bc2c919a ("locking/mutexes: Add extra reschedule point") made it
>> a *bad* idea.  mutex_lock() since 3.15 may reschedule *before* putting
>> task on the mutex wait queue, which for tasks in !TASK_RUNNING state
>> means block forever.  request_fn() may be called with !TASK_RUNNING on
>> the way to schedule() in io_schedule().
>>
>> Offload request handling to a workqueue, one per rbd device, to avoid
>> calling blocking primitives from rbd_request_fn().
>
> Off topic...  If you supply "--patience" to your git diff command
> you'll get an easier-to-read result in some cases (like this one).
> If you like that you can just do:
>     git config --global --add diff.algorithm patience

I'm generally using (and used to review this patch in particular)
--histogram.  In my experience it gives slightly better results than
--patience.  Didn't think of putting it into .gitconfig though.

>
> I have several comments below, none of which are bugs, just
> suggestions.
>
> I was thinking you could use a single work queue for all rbd
> devices, but I'm not sure that's very important.  You'd have
> to stash the rbd_dev pointer in every request somewhere.
>
> Reviewed-by: Alex Elder <elder@xxxxxxxxxx>
>
>
>> Cc: stable@xxxxxxxxxxxxxxx # 3.15+
>> Signed-off-by: Ilya Dryomov <ilya.dryomov@xxxxxxxxxxx>
>> ---
>>  drivers/block/rbd.c |  195 +++++++++++++++++++++++++++++++--------------------
>>  1 file changed, 118 insertions(+), 77 deletions(-)
>>
>> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
>> index e3412317d3f9..e07d9d71b187 100644
>> --- a/drivers/block/rbd.c
>> +++ b/drivers/block/rbd.c
>> @@ -42,6 +42,7 @@
>>  #include <linux/blkdev.h>
>>  #include <linux/slab.h>
>>  #include <linux/idr.h>
>> +#include <linux/workqueue.h>
>>
>>  #include "rbd_types.h"
>>
>> @@ -332,7 +333,10 @@ struct rbd_device {
>>
>>       char                    name[DEV_NAME_LEN]; /* blkdev name, e.g. rbd3 */
>>
>> +     struct list_head        rq_queue;       /* incoming rq queue */
>>       spinlock_t              lock;           /* queue, flags, open_count */
>> +     struct workqueue_struct *rq_wq;
>> +     struct work_struct      rq_work;
>>
>>       struct rbd_image_header header;
>>       unsigned long           flags;          /* possibly lock protected */
>> @@ -3176,102 +3180,129 @@ out:
>>       return ret;
>>  }
>>
>> -static void rbd_request_fn(struct request_queue *q)
>> -             __releases(q->queue_lock) __acquires(q->queue_lock)
>> +static void rbd_handle_request(struct rbd_device *rbd_dev, struct request *rq)
>>  {
>
> . . .
>
>> +     if (offset + length > rbd_dev->mapping.size) {
>> +             rbd_warn(rbd_dev, "beyond EOD (%llu~%llu > %llu)", offset,
>> +                      length, rbd_dev->mapping.size);
>> +             result = -EIO;
>> +             goto err_rq;
>> +     }
>>
>> -             spin_unlock_irq(q->queue_lock);
>> +     img_request = rbd_img_request_create(rbd_dev, offset, length, wr);
>
> This isn't new with your patch, but perhaps we should consider
> having rbd_img_request_create() use GFP_KERNEL or at least GFP_NOIO
> rather than GFP_ATOMIC.  In general, I think a pass ought to be
> made through all the allocation calls to make sure they have the
> right flags.  Some may be too restrictive, some maybe not enough.

Definitely, I already have a patch that switches img_request to NOIO -
allocating it with ATOMIC in workfn doesn't make sense.

As for the general pass, I think there is a ticket for that, so
hopefully I'll get to it sometime..

Thanks,

                Ilya
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html