On Mon, Mar 4, 2019 at 1:42 PM Igor Konopko <igor.j.konopko@xxxxxxxxx> wrote: > > > > On 04.03.2019 12:45, Javier González wrote: > > > >> On 4 Mar 2019, at 12.41, Hans Holmberg <hans@xxxxxxxxxxxxx> wrote: > >> > >> On Mon, Mar 4, 2019 at 10:23 AM Javier González <javier@xxxxxxxxxxx> wrote: > >>>> On 4 Mar 2019, at 10.02, Hans Holmberg <hans.ml.holmberg@xxxxxxxxxxxxx> wrote: > >>>> > >>>> Igor: Have you seen this happening in real life? > >>>> > >>>> I think it would be better to count all expected errors and put them > >>>> in the right bucket (without spamming dmesg). If we need a new bucket > >>>> for i.e. vendor-specific-errors, let's do that instead. > > Generally I'm seeing different types of errors (which are typically as > Javier mention controller errors) in cases such as hot drive removal, etc. > > We can skip that patch, since this are kind of corner cases. I can also > create new type of pblk stats, sth. like "controller errors", which > would collect all the other unexpected errors in one place instead of > mixing them with real read/write errors as I did. Yes, that makes sense. Thanks, Hans > > >>>> > >>>> Someone wiser than me told me that every error print in the log is a > >>>> potential customer call. > >>>> > >>>> Javier: Yeah, I think S.M.A.R.T is the way to deliver this > >>>> information. Why can't we let the drives expose this info and remove > >>>> this from pblk? What's blocking that? > >>> > >>> Until now the spec. We added some new log information in Denali exactly > >>> for this. But since pblk supports OCSSD 1.2 and 2.0 I think it is needed to > >>> have it here, at least for debugging. > >> > >> Why add it to the spec? Why not use whatever everyone else is using? > >> > >> https://en.wikipedia.org/wiki/S.M.A.R.T. : > >> "S.M.A.R.T. (Self-Monitoring, Analysis and Reporting Technology; often > >> written as SMART) is a monitoring system included in computer hard > >> disk drives (HDDs), solid-state drives (SSDs),[1] and eMMC drives. Its > >> primary function is to detect and report various indicators of drive > >> reliability with the intent of anticipating imminent hardware > >> failures." > >> Sounds like what we want here. > > > > I know what smart is… You need to define the fields. Maybe you want to > > read Denali again - the extensions are couple with smart. > > > >> For debugging, a trace point or something(i.e. BPF) would be a better > >> solution that would not impact hot-path performance. > > > > Cool. Look forward to the patches ;) > > > > Javier > >