RE: [bug report][bisected] blktests nvme/tcp nvme/030 failed on latest linux-block/for-next

"Belanger, Martin" <Martin.Belanger@xxxxxxxx> · Thu, 11 Aug 2022 12:32:00 +0000

> -----Original Message-----
> From: Sagi Grimberg <sagi@xxxxxxxxxxx>
> Sent: Thursday, August 11, 2022 8:28 AM
> To: Belanger, Martin; Yi Zhang
> Cc: linux-block; open list:NVM EXPRESS DRIVER; Chaitanya Kulkarni
> Subject: Re: [bug report][bisected] blktests nvme/tcp nvme/030 failed on
> latest linux-block/for-next
> 
> 
> [EXTERNAL EMAIL]
> 
> 
> >>>>>>>> nvme/030 triggered several errors during CKI tests on
> >>>>>>>> linux-block/for-next, pls help check it, and feel free to let
> >>>>>>>> me know if you need any test/info, thanks.
> >>>>
> >>>> Hi Chaitanya and Yi,
> >>>>
> >>>> This commit (submitted last February) simply exposes two read-only
> >>>> attributes to the sysfs.
> >>>
> >>> Seems it was not the culprit, but nvme/030 can pass after I revert
> >>> that commit on v5.19.
> >>>
> >>> Hi Sagi
> >>>
> >>> I did more testing and finally found that reverting this udev rule
> >>> change in nvme-cli fix the nvme/030 failure issue,  could you check
> >>> it?
> >>>
> >>> commit f86faaaa2a1ff319bde188dc8988be1ec054d238 (refs/bisect/bad)
> >>> Author: Sagi Grimberg <sagi@grimberg.m
> >>> Date:   Mon Jun 27 11:06:50 2022 +0300
> >>>
> >>>       udev: re-read the discovery log page when a discovery
> >>> controller reconnected
> >>>
> >>>       When using persistent discovery controllers, if the discovery
> >>>       controller loses connectivity and manage to reconnect after a while,
> >>>       we need to retrieve again the discovery log page in order to learn
> >>>       about possible changes that may have occurred during this time as
> >>>       discovery log change events were lost.
> >>>
> >>>       Signed-off-by: Sagi Grimberg <sagi@xxxxxxxxxxx>
> >>>       Signed-off-by: Daniel Wagner <dwagner@xxxxxxx>
> >>>       Link:
> >>> https://urldefense.com/v3/__https://lore.kernel.org/r/20220627080650
> >>> .1
> >>> 08936-1-
> >> sagi@grimberg.me__;!!LpKI!lYFKeBqI0lmp0AycSrZ6krKxEMUNjSwCO-tY
> >>> -FyMAu5KLid5bBqYpfEBGaRgfGtk1c3HLXUekSSPXr6pKw$
> >>> [lore[.]kernel[.]org]
> >>
> >> Yes, this change is reverted now from nvme-cli...
> >> I'm thinking how should we solve the original issue, the only way I
> >> can think of at this moment is a "reconnected" event. Does anyone
> >> have an idea how userspace can do the right thing here without it?
> >
> > Hi Sagi. We had a discussion regarding this back in January (or February?).
> >
> > I needed such an event on a reconnect for my project, nvme-stas:
> > https://urldefense.com/v3/__https://github.com/linux-nvme/nvme-stas__;
> > !!LpKI!irX6S76eib64xSu731DcJWqFyHakPIhRLmgFr0znASmf1y7sNmuQuYQrx
> -_t5Ks
> > 4Zz0q1d9Scr_JY9RM4g$ [github[.]com]
> >
> > This event was needed so that the host could re-register with a CDC on
> > a reconnect (per TP8010). At your suggestion, I added
> "NVME_EVENT=connected"
> > in host/core.c. This has been working great for me. Maybe the udev
> > rule could be modified to look for this event.
> 
> That is exactly what it does, that is why nvme discover unexpectedly connects
> to all log entries, because the udev event triggers..
> 
> In order to address the problem of missed AEN while controller was
> disconnected, we need to re-issue the log-page on a re-conect, not a first
> connect.

Ah! Got it. Sorry for the noise. -Martin