Re: [PATCH] scsi: ufs: mark HPB support as BROKEN

Ming Lei <ming.lei@xxxxxxxxxx> · Thu, 28 Oct 2021 10:07:35 +0800

On Thu, Oct 28, 2021 at 10:10:22AM +0900, Daejun Park wrote:
> Hi Ming,
> 
> > On Wed, Oct 27, 2021 at 09:16:32AM -0700, Keith Busch wrote:
> > > On Wed, Oct 27, 2021 at 11:58:23PM +0800, Ming Lei wrote:
> > > > On Wed, Oct 27, 2021 at 11:44:04AM -0400, Martin K. Petersen wrote:
> > > > > 
> > > > > Ming,
> > > > > 
> > > > > > request with scsi_cmnd may be allocated by the ufshpb driver, even it
> > > > > > should be fine to call ufshcd_queuecommand() directly for this driver
> > > > > > private IO, if the tag can be reused. One example is scsi_ioctl_reset().
> > > > > 
> > > > > scsi_ioctl_reset() allocates a new request, though, so that doesn't
> > > > > solve the forward progress guarantee. Whereas eh puts the saved request
> > > > > on the stack.
> > > > 
> > > > What I meant is to use one totally ufshpb private command allocated from
> > > > private slab to replace the spawned request, which is sent to ufshcd_queuecommand()
> > > > directly, so forward progress is guaranteed if the blk-mq request's tag can be
> > > > reused for issuing this private command. This approach takes a bit effort,
> > > > but avoids tags reservation.
> > > > 
> > > > Yeah, it is cleaner to use reserved tag for the spawned request, but we
> > > > need to know:
> > > > 
> > > > 1) how many queue depth for the hba? If it is small, even 1 reservation
> > > > can affect performance.
> > > > 
> > > > 2) how many inflight write buffer commands are to be supported? Or how many
> > > > is enough for obtaining expected performance? If the number is big, reserved
> > > > tags can't work.
> > > 
> > > The original and clone are not dispatched to hardware concurrently, so I
> > > don't think the reserved_tags need to subtract from the generic ones.
> > > The original request already accounts for the hardware resource, so the
> > > clone doesn't need to consume another one.
> >  
> > Yeah, that is why I thought the tag could be reused for the spawned(cloned)
> > request, but it needs ufshpb developer to confirm, or at least
> > ufshcd_queuecommand() can handle this situation. If that is true, it isn't
> > necessary to use reserve tags, since the current blk-mq implementation
> > requires to reserve real hardware tags space, which has to take normal
> > tags.
> 
> It is true that pre-request can use the tag of READ request, but the READ
> request should wait to completion of the pre-request command. However, if
> the pre-request and the READ request are dispatched concurrently, it can
> save the time to completion of the pre-request.
> 
> So I implemented as allocating new request and it has limit time to getting
> pre-request, so it doesn't cause deadlock.

The WB pre-request and the READ IO should have been issued in single
command with same tag.

If they can't be done in single command, this sw/hw interface is really bad,
one way is to follow Jens's suggestion to reserve one tag for guaranteeing
forward progress, something like the following:

ufshpb_issue_pre_req()

        req = blk_get_request(cmd->device->request_queue,
                              REQ_OP_DRV_OUT | REQ_SYNC, BLK_MQ_REQ_NOWAIT);
        if (IS_ERR(req)) {
        	req = blk_get_request(cmd->device->request_queue,
                              REQ_OP_DRV_OUT | REQ_SYNC, BLK_MQ_REQ_NOWAIT | BLK_MQ_REQ_RESERVED);
			if (IS_ERR(req))
                return -EAGAIN;
		}

But the above isn't what Christoph complained:

	The HPB support added this merge window is fundanetally flawed as it
	uses blk_insert_cloned_request to insert a cloned request onto the same
	queue as the one that the original request came from, leading to all
	kinds of issues in blk-mq accounting (in addition to this API being
	a special case for dm-mpath that should not see other users).

IMO, accounting seems fine since block layer always counts driver
private request.

thanks,
Ming