On 6/11/24 01:03, Keith Busch wrote: > On Mon, Jun 10, 2024 at 10:17:42PM +0300, Sagi Grimberg wrote: >> On 10/06/2024 22:15, Keith Busch wrote: >>> On Mon, Jun 10, 2024 at 10:05:00PM +0300, Sagi Grimberg wrote: >>>> >>>> On 10/06/2024 21:53, Keith Busch wrote: >>>>> On Mon, Jun 10, 2024 at 01:21:00PM +0530, Venkat Rao Bagalkote wrote: >>>>>> Issue is introduced by the patch: be647e2c76b27f409cdd520f66c95be888b553a3. >>>>> My mistake. The namespace remove list appears to be getting corrupted >>>>> because I'm using the wrong APIs to replace a "list_move_tail". This is >>>>> fixing the issue on my end: >>>>> >>>>> --- >>>>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c >>>>> index 7c9f91314d366..c667290de5133 100644 >>>>> --- a/drivers/nvme/host/core.c >>>>> +++ b/drivers/nvme/host/core.c >>>>> @@ -3959,9 +3959,10 @@ static void nvme_remove_invalid_namespaces(struct nvme_ctrl *ctrl, >>>>> mutex_lock(&ctrl->namespaces_lock); >>>>> list_for_each_entry_safe(ns, next, &ctrl->namespaces, list) { >>>>> - if (ns->head->ns_id > nsid) >>>>> - list_splice_init_rcu(&ns->list, &rm_list, >>>>> - synchronize_rcu); >>>>> + if (ns->head->ns_id > nsid) { >>>>> + list_del_rcu(&ns->list); >>>>> + list_add_tail_rcu(&ns->list, &rm_list); >>>>> + } >>>>> } >>>>> mutex_unlock(&ctrl->namespaces_lock); >>>>> synchronize_srcu(&ctrl->srcu); >>>>> -- >>>> Can we add a reproducer for this in blktests? I'm assuming that we can >>>> easily trigger this >>>> with adding/removing nvmet namespaces? >>> I'm testing this with Namespace Manamgent commands, which nvmet doesn't >>> support. You can recreate the issue by detaching the last namespace. >>> >> >> I think the same will happen in a test that creates two namespaces and then >> echo 0 > ns/enable. > > Looks like nvme/016 tess this. It's reporting as "passed" on my end, but > I don't think it's actually testing the driver as intended. Still > messing with it. > I believe nvme/016 creates and deletes the namespace however there's no backstore associated with the loop device and hence nvme/016 is unable to recreate this issue. To recreate this issue, we need to associate a backstore (either a block-dev or a regular-file) to the loop device and then use it for creating and then deleting the namespace. I wrote a blktest for this specific regression and I could able to trigger this crash. I would submit this blktest in a separate email. Thanks, --Nilay