On 10/15/18 4:44 PM, Richard Weinberger wrote: > Am Dienstag, 16. Oktober 2018, 00:04:20 CEST schrieb Jens Axboe: >> On 10/15/18 3:46 PM, Richard Weinberger wrote: >>> Am Montag, 15. Oktober 2018, 22:55:29 CEST schrieb Christoph Hellwig: >>>> On Mon, Oct 15, 2018 at 10:42:47PM +0200, Richard Weinberger wrote: >>>>>> Sadly not. I'm checking now what exactly is broken. >>>>> >>>>> I take this back. Christoph's fixup makes reading work. >>>>> The previous version corrupted my test block device in interesting ways >>>>> and confused all tests. >>>>> But the removal of blk_rq_map_sg() still has issues. >>>>> Now the device blocks endless upon flush. >>>> >>>> I suspect we still need to special case flush. Updated patch below >>>> including your other suggestion: >>> >>> While playing further with the patch I managed to hit >>> BUG_ON(blk_queued_rq(rq)) in blk_mq_requeue_request(). >>> >>> UML requeues the request in ubd_queue_one_vec() if it was not able >>> to submit the request to the host io-thread. >>> The fd can return -EAGAIN, then UML has to try later. >>> >>> Isn't this allowed in that context? >> >> It is, the problem is that queue_one_vec() doesn't always return an >> error. The caller is doing a loop per bio, so we can encounter an >> error, requeue, and then the caller will call us again. We're in >> an illegal state at that point, and the next requeue will make that >> obvious since it's already pending. Actually, both the caller and >> ubd_queue_one_vec() also requeue. So it's a bit of a mess, the below >> might help. > > I agree, the driver *is* a mess. > Unless someone else volunteers to clean it up, I'll push that task on > my never ending TODO list. I doubt you'll have to fight anyone for that task ;-) > Thanks for your hint with the illegal state. > Now with correct requeuing the driver seems to work fine! > Write/Flush support also suffered from that but didn't trigger the BUG_ON()... OK good, at least we're making progress! -- Jens Axboe