On Thu, Oct 28, 2021 at 10:10:22AM +0900, Daejun Park wrote: > Hi Ming, > > > On Wed, Oct 27, 2021 at 09:16:32AM -0700, Keith Busch wrote: > > > On Wed, Oct 27, 2021 at 11:58:23PM +0800, Ming Lei wrote: > > > > On Wed, Oct 27, 2021 at 11:44:04AM -0400, Martin K. Petersen wrote: > > > > > > > > > > Ming, > > > > > > > > > > > request with scsi_cmnd may be allocated by the ufshpb driver, even it > > > > > > should be fine to call ufshcd_queuecommand() directly for this driver > > > > > > private IO, if the tag can be reused. One example is scsi_ioctl_reset(). > > > > > > > > > > scsi_ioctl_reset() allocates a new request, though, so that doesn't > > > > > solve the forward progress guarantee. Whereas eh puts the saved request > > > > > on the stack. > > > > > > > > What I meant is to use one totally ufshpb private command allocated from > > > > private slab to replace the spawned request, which is sent to ufshcd_queuecommand() > > > > directly, so forward progress is guaranteed if the blk-mq request's tag can be > > > > reused for issuing this private command. This approach takes a bit effort, > > > > but avoids tags reservation. > > > > > > > > Yeah, it is cleaner to use reserved tag for the spawned request, but we > > > > need to know: > > > > > > > > 1) how many queue depth for the hba? If it is small, even 1 reservation > > > > can affect performance. > > > > > > > > 2) how many inflight write buffer commands are to be supported? Or how many > > > > is enough for obtaining expected performance? If the number is big, reserved > > > > tags can't work. > > > > > > The original and clone are not dispatched to hardware concurrently, so I > > > don't think the reserved_tags need to subtract from the generic ones. > > > The original request already accounts for the hardware resource, so the > > > clone doesn't need to consume another one. > > > > Yeah, that is why I thought the tag could be reused for the spawned(cloned) > > request, but it needs ufshpb developer to confirm, or at least > > ufshcd_queuecommand() can handle this situation. If that is true, it isn't > > necessary to use reserve tags, since the current blk-mq implementation > > requires to reserve real hardware tags space, which has to take normal > > tags. > > It is true that pre-request can use the tag of READ request, but the READ > request should wait to completion of the pre-request command. However, if > the pre-request and the READ request are dispatched concurrently, it can > save the time to completion of the pre-request. > > So I implemented as allocating new request and it has limit time to getting > pre-request, so it doesn't cause deadlock. The WB pre-request and the READ IO should have been issued in single command with same tag. If they can't be done in single command, this sw/hw interface is really bad, one way is to follow Jens's suggestion to reserve one tag for guaranteeing forward progress, something like the following: ufshpb_issue_pre_req() req = blk_get_request(cmd->device->request_queue, REQ_OP_DRV_OUT | REQ_SYNC, BLK_MQ_REQ_NOWAIT); if (IS_ERR(req)) { req = blk_get_request(cmd->device->request_queue, REQ_OP_DRV_OUT | REQ_SYNC, BLK_MQ_REQ_NOWAIT | BLK_MQ_REQ_RESERVED); if (IS_ERR(req)) return -EAGAIN; } But the above isn't what Christoph complained: The HPB support added this merge window is fundanetally flawed as it uses blk_insert_cloned_request to insert a cloned request onto the same queue as the one that the original request came from, leading to all kinds of issues in blk-mq accounting (in addition to this API being a special case for dm-mpath that should not see other users). IMO, accounting seems fine since block layer always counts driver private request. thanks, Ming