Re: [PATCH 05/13] lightnvm: pblk: Count all read errors in stats

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 04.03.2019 12:45, Javier González wrote:

On 4 Mar 2019, at 12.41, Hans Holmberg <hans@xxxxxxxxxxxxx> wrote:

On Mon, Mar 4, 2019 at 10:23 AM Javier González <javier@xxxxxxxxxxx> wrote:
On 4 Mar 2019, at 10.02, Hans Holmberg <hans.ml.holmberg@xxxxxxxxxxxxx> wrote:

Igor: Have you seen this happening in real life?

I think it would be better to count all expected errors and put them
in the right bucket (without spamming dmesg). If we need a new bucket
for i.e. vendor-specific-errors, let's do that instead.

Generally I'm seeing different types of errors (which are typically as Javier mention controller errors) in cases such as hot drive removal, etc.

We can skip that patch, since this are kind of corner cases. I can also
create new type of pblk stats, sth. like "controller errors", which would collect all the other unexpected errors in one place instead of mixing them with real read/write errors as I did.


Someone wiser than me told me that every error print in the log is a
potential customer call.

Javier: Yeah, I think S.M.A.R.T is the way to deliver this
information. Why can't we let the drives expose this info and remove
this from pblk? What's blocking that?

Until now the spec. We added some new log information in Denali exactly
for this. But since pblk supports OCSSD 1.2 and 2.0 I think it is needed to
have it here, at least for debugging.

Why add it to the spec? Why not use whatever everyone else is using?

https://en.wikipedia.org/wiki/S.M.A.R.T. :
"S.M.A.R.T. (Self-Monitoring, Analysis and Reporting Technology; often
written as SMART) is a monitoring system included in computer hard
disk drives (HDDs), solid-state drives (SSDs),[1] and eMMC drives. Its
primary function is to detect and report various indicators of drive
reliability with the intent of anticipating imminent hardware
failures."
Sounds like what we want here.

I know what smart is… You need to define the fields. Maybe you want to
read Denali again - the extensions are couple with smart.

For debugging, a trace point or something(i.e. BPF) would be a better
solution that would not impact hot-path performance.

Cool. Look forward to the patches ;)

Javier




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux