Re: LVM kernel lockup scenario during lvcreate

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On 2023/08/24 19:13, Bart Van Assche wrote:
On 8/24/23 00:29, Jaco Kroon wrote:
We're definitely seeing the same thing on another host using an ahci controller.  This seems to hint that it's not a firmware issue, as does the fact that this happens much less frequently with the none scheduler.

That is unexpected. I don't think there is enough data available yet to
conclude whether these issues are identical or not?
It's hard for me to even conclude that two consecutive crashes are even exactly the same issue ... however, there's strong correlation in that there generally are lvcreate commands in D state which to me hints that it's something to do with LVM snapshot creation (both traditional - ahci controller, and thing - super micro).

I will make a plan to action the firmware updates on the raid controller over the weekend regardless, just in order to eliminate that.  I will then revert to mq-deadline. Assuming this does NOT fix it, how would I go about assessing if this is a controller firmware issue or a Linux kernel issue?

If the root cause would be an issue in the mq-deadline scheduler or in
the core block layer then there would be many more reports about I/O
lockups. For this case I think that it's very likely that the root cause is either the I/O controller driver or the I/O controller firmware.

I tend to agree with that.  And given the fact that we probably have in excess of 50 hosts and it generally just seems to be these two hosts in question that bites into this ... I agree with your assessment.  Except that at least the AHCI host never *used* to do this and only fairly recently started with this behaviour.

So here's what I personally *think* makes these two hosts unique:

1.  The ACHI controller hosts unfortunately ~15 years back was set up with "thick" volumes and use traditional snapshots (The hardware has been replaced piecemeal over the years so none of the original hardware is still in use).  This started exhibiting the same behaviour where for reasons I can't go into we started making multiple snapshots of the same origin LV simultaneously - this is unfortunate, thin snaps would be way more performant during the few hours where these two snaps are required.

2.  The LSI controller on the SM host uses a thin pool of of 125TB and contains 27 "origins", 26 of which follows this pattern on a daily basis:
2.1  Create thin snap of ${name} as fsck_${name}.
2.2  fsck gets run on the snapshot to ensure consistency.  If this fails, bail out and report error to management systems.
2.3 if save_${name} exist, remote it.
2.4 rename fsck_${name} to save_${name}.

3.  IO on the SM host often goes in excess of 1GB/s and often "idles" around 400MB/s, which I'm sure in the bigger scheme of things isn't really that heavy of a load, but considering most of our other hosts barely peak at 150MB/s and generally don't do more than 10MB/s it's significant for us.  Right now as I'm typing this we're doing between 1500 and 3000 reads/s (saw it peak just over 6000 now) and 500-1000 writes/s (and peaked just over 3000).  I'm well aware there are systems with much higher IOPs values, but for us this is fairly high, even a few years back I saw statistics on systems doing 10k+ IOPs.

4.  Majority of our hosts with raid controllers are megaraid, I can't think of any other hosts off the top of my head also using mpt3sas, but we do have a number with AHCI.  This supports the theory again that it's the firmware on the controller, so I'll be sure to do that on Sat morning too when I've got a reboot slot. Hopefully that'll just make the problem go away.

Thanks for all the help in this, really appreciated.  I know we seem to be running in circles, but I believe we are making progress even if slowly, at a minimum I'm learning quite a bit which in and by itself is putting us in a better position to figure this out.  I do think that it could be controller, but as I've stated before as well, we've previously seen issues with snapshot creation for many years now, killing dmeventd sorted that out except on these two hosts now.  And they are special in that they create multiple snapshots of the same origin.  Perhaps that's the clue since frankly that's the one thing they share, and the one thing that makes them distinct from the other hosts we run.

Kind regards,
Jaco





[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux