On Mon, Mar 4, 2019 at 10:23 AM Javier González <javier@xxxxxxxxxxx> wrote: > > > On 4 Mar 2019, at 10.02, Hans Holmberg <hans.ml.holmberg@xxxxxxxxxxxxx> wrote: > > > > Igor: Have you seen this happening in real life? > > > > I think it would be better to count all expected errors and put them > > in the right bucket (without spamming dmesg). If we need a new bucket > > for i.e. vendor-specific-errors, let's do that instead. > > > > Someone wiser than me told me that every error print in the log is a > > potential customer call. > > > > Javier: Yeah, I think S.M.A.R.T is the way to deliver this > > information. Why can't we let the drives expose this info and remove > > this from pblk? What's blocking that? > > Until now the spec. We added some new log information in Denali exactly > for this. But since pblk supports OCSSD 1.2 and 2.0 I think it is needed to > have it here, at least for debugging. Why add it to the spec? Why not use whatever everyone else is using? https://en.wikipedia.org/wiki/S.M.A.R.T. : "S.M.A.R.T. (Self-Monitoring, Analysis and Reporting Technology; often written as SMART) is a monitoring system included in computer hard disk drives (HDDs), solid-state drives (SSDs),[1] and eMMC drives. Its primary function is to detect and report various indicators of drive reliability with the intent of anticipating imminent hardware failures." Sounds like what we want here. For debugging, a trace point or something(i.e. BPF) would be a better solution that would not impact hot-path performance. > > > > > Thanks, > > Hans > > > > On Mon, Mar 4, 2019 at 8:42 AM Javier González <javier@xxxxxxxxxxx> wrote: > >>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@xxxxxxxxx> wrote: > >>> > >>> Currently when unknown error occurs on read path > >>> there is only dmesg information about it, but it > >>> is not counted in sysfs statistics. Since this is > >>> still an error we should also count it there. > >>> > >>> Signed-off-by: Igor Konopko <igor.j.konopko@xxxxxxxxx> > >>> --- > >>> drivers/lightnvm/pblk-core.c | 1 + > >>> 1 file changed, 1 insertion(+) > >>> > >>> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c > >>> index eabcbc119681..a98b2255f963 100644 > >>> --- a/drivers/lightnvm/pblk-core.c > >>> +++ b/drivers/lightnvm/pblk-core.c > >>> @@ -493,6 +493,7 @@ void pblk_log_read_err(struct pblk *pblk, struct nvm_rq *rqd) > >>> atomic_long_inc(&pblk->read_failed); > >>> break; > >>> default: > >>> + atomic_long_inc(&pblk->read_failed); > >>> pblk_err(pblk, "unknown read error:%d\n", rqd->error); > >>> } > >>> #ifdef CONFIG_NVM_PBLK_DEBUG > >>> -- > >>> 2.17.1 > >> > >> I left this out intentionally so that we could correlate the logs from > >> the controller and the errors in the read path. Since we do not have an > >> standard way to correlate this on SMART yet, let’s add this now (I > >> assume that you are using it for something?) and we can separate the > >> error stats in the future. > >> > >> Reviewed-by: Javier González <javier@xxxxxxxxxxx>