Re: LVM RAID (was Re: Summary/Minutes from today's FESCo Meeting (2017-02-10))

Chris Murphy <lists@xxxxxxxxxxxxxxxxx> · Fri, 10 Feb 2017 13:31:33 -0700

On Fri, Feb 10, 2017 at 12:27 PM, Chris Adams <linux@xxxxxxxxxxx> wrote:
> Once upon a time, Justin Forbes <jmforbes@xxxxxxxxxxx> said:
>>   * AGREED: Discussion on anaconda LVM change is delayed until 2-17
>>     provided open questions get answered (+1:6,0:0,-1:0)  (jforbes,
>>     17:24:33)
>
> So, not sure if this is in the "open questions", but on the mailing list
> I brought up the fact that MD RAID has an automatic cron job to do
> consistency checks and LVM RAID does not.  I don't see anything about
> that in the change proposal, but IMHO that's a regression (and a
> significant one, given studies that show undetected RAID failures are a
> real thing).

The cron job exists but as far as I can tell it is not doing anything
out of the box. I've never seen md scrubs happen on a schedule
automatically.

LVM does have a way to do scrubs, and it'd be fairly easy to make a
systemd timer that'd do it on a schedule. The lack of it existing or
enabled by default I'd say is not good. But as far as who's
responsibility it is? I'd say it's the GUI program that creates the
array. If a GUI program is going to get into the business of making it
easier to create arrays, then it needs to be in the business of
configuring preventative maintenance and warnings. But we don't even
have configuration for email warnings for mdadm arrays. And further
there's a rather significant kernel deficiency with SCSI command
timers being set by default to 30 seconds, which is totally contrary
to getting proper bad sector recoveries with the vast majority of
drives - whether they are in an array or not. And while the upstream
file system and md kernel developers know about this, there is an
incredible amount of resistance changing it. It's bad enough that this
often thwarts scrubs (whether mdadm, LVM, or Btrfs based).

So the net here is that there are all sorts of ways this can suck for
the uninitiated user out of the box. Is the lack of scrub happening
out of the box a problem? Yes but enabling it without fixing the
bullshit 30 second SCSI command timer default is a bigger problem, and
just enabling scrubs without fixing the timer means you end up with a
greater chance md is going to kick a drive out of the array for the
very simple problem of one bad sector, instead of fixing the bad
sector with a remap. And if we aren't getting device faulty
notifications in GNOME shell, now we're in a much net worse situation.

>
> I also don't know if LVM RAID can email (or otherwise notify admins)
> about failures like MD RAID already does.

Nope. The monitoring works with dmeventd, but I'm not sure how
sophisticated the monitoring is or what messaging method it uses
(dbus?) or what monitors it (udisksd or storaged?). But again, the
loss of email notification is not a show stopper. I think email
notification is archaic anyway, I'd rather it support some kind of
ticketing system, reporting it all to storaged or other daemon of my
choice, where I can choose how I want to be notified: pop-up, email,
text message, tweet, light a match...etc.

-- 
Chris Murphy
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx