On 4/30/20 11:40 AM, Bodo Stroesser wrote: > > > On 04/30/20 17:59, Mike Christie wrote: >> On 4/30/20 10:10 AM, Bodo Stroesser wrote: >>> Hi, >>> >>> When tcmu queues a new command - no matter whether in command >>> ring or in qfull_queue - a cmd_id from IDR udev->commands is >>> assigned to the command. >>> >>> If userspaces sends a wrong command completion containing the >>> cmd_id of a command on the qfull_queue, tcmu_handle_completions() >>> finds the command in the IDR and calls tcmu_handle_completion() >>> for it. This might do some nasty things, because commands in >>> qfull_queue do not have a valid dbi list. >>> >>> Of course this is easy to fix. E.g.: >>> >>> >>> --- a/drivers/target/target_core_user.c 2020-04-30 >>> 14:15:36.177184668 +0200 >>> +++ b/drivers/target/target_core_user.c 2020-04-30 >>> 14:29:43.331882066 +0200 >>> @@ -1242,13 +1242,14 @@ static unsigned int tcmu_handle_completi >>> } >>> WARN_ON(tcmu_hdr_get_op(entry->hdr.len_op) != TCMU_OP_CMD); >>> - cmd = idr_remove(&udev->commands, entry->hdr.cmd_id); >>> - if (!cmd) { >>> + cmd = idr_find(&udev->commands, entry->hdr.cmd_id); >>> + if (!cmd || !test_bit(TCMU_CMD_BIT_INFLIGHT, &cmd->flags)) { >>> pr_err("cmd_id %u not found, ring is broken\n", >>> entry->hdr.cmd_id); >>> set_bit(TCMU_DEV_BIT_BROKEN, &udev->flags); >>> break; >>> } >>> + idr_remove(&udev->commands, entry->hdr.cmd_id); >>> tcmu_handle_completion(cmd, entry); >>> >>> To avoid the second idr_*() call in main data path, it would also >>> be possible to not replace the idr_remove() by idr_find(), but >>> in case cmd was found without TCMU_CMD_BIT_INFLIGHT being set, >>> during error handling re-add the cmd to the idr with the same id >>> as before by calling idr_alloc(...,cmd_id, crmdd_id+1, ...). >>> >>> >>> OTOH, I'm wondering whether the better solution would be to >>> remove idr_alloc() from tcmu_setup_cmd_timer() and add it >>> to queue_cmd_ring() immediately before or after it calls >>> tcmu_setup_cmd_timer(). >>> Doing so would mean that commands in qfull_queue no longer have >>> a cmd_id (which is not necessary AFAICS) and therefore the problem >> >> It's done for logging, but its only debug logging so we could just print >> the pointer value. >> > > Yes, and then add the pointer value to the pr_debug that prints the > assigned cmd_id after idr_alloc() > >>> cannot happen, because tcmu_handle_completions() cannot find them >>> in the IDR. >>> >> >> This is probably best if it's not a lot of trouble. > > I already spent some time to check the necessary changes. I think > there is no trouble, just some small changes. > So, yes, the bigger fix is probably better. > > I'll wait till next week for further comments and then send a patch. > >> >> If you go this route then in tcmu_reset_ring I think I made a mistake > > I'm not sure, but I don't think there is a mistake. If you cleanup > the ring, you don't need to cancel the commands in qfull_queue. I was thinking about your userspace restart PGR question in another mail. Between the time userspace stopped and restarted the session->id to I_T nexus mapping could have changed, so id now refers to a different I_T nexus. Code path wise, it would be easier if we restarted the inflight and non-inflight commands the same way. > >> and we just continue the loop if TCMU_CMD_BIT_INFLIGHT is not set. I >> think we want to do: >> >> for cmd in qfull_list >> target_complete_cmd() > So I think, this is not really necessary, but after > clear_bit(TCMU_DEV_BIT_BROKEN, &udev->flags); > > we could do > run_qfull_queue(udev, false); > > right? > >> >> for cmd in idr >> target_complete_cmd() >> >