Re: LVM kernel lockup scenario during lvcreate

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Bart,

On 2023/06/27 01:29, Jaco Kroon wrote:
Hi Bart,

On 2023/06/26 18:42, Bart Van Assche wrote:
On 6/26/23 01:30, Jaco Kroon wrote:
Please find attached updated ps and dmesg too, diskinfo wasn't regenerated but this doesn't generally change.

Not sure how this all works, according to what I can see the only disk with pending activity (queued) is sdw?  Yet, a large number of processes is blocking on IO, and yet again the stack traces in dmesg points at __schedule.  For a change we do not have lvcreate in the process list! This time that particular script's got a fsck in uninterruptable wait ...

Hi Jaco,

I see pending commands for five different SCSI disks:

$ zgrep /busy: block.gz
./sdh/hctx0/busy:00000000affe2ba0 {.op=WRITE, .cmd_flags=SYNC|FUA, .rq_flags=FLUSH_SEQ|MQ_INFLIGHT|DONTPREP|IO_STAT|ELV, .state=in_flight, .tag=2055, .internal_tag=214, .cmd=Write(16) 8a 08 00 00 00 01 d1 c0 b7 88 00 00 00 08 00 00, .retries=0, .result = 0x0, .flags=TAGGED|INITIALIZED|LAST, .timeout=180.000, allocated 0.000 s ago} ./sda/hctx0/busy:00000000987bb7c7 {.op=WRITE, .cmd_flags=SYNC|FUA, .rq_flags=FLUSH_SEQ|MQ_INFLIGHT|DONTPREP|IO_STAT|ELV, .state=in_flight, .tag=2050, .internal_tag=167, .cmd=Write(16) 8a 08 00 00 00 01 d1 c0 b7 88 00 00 00 08 00 00, .retries=0, .result = 0x0, .flags=TAGGED|INITIALIZED|LAST, .timeout=180.000, allocated 0.010 s ago} ./sdw/hctx0/busy:00000000aec61b17 {.op=READ, .cmd_flags=META|PRIO, .rq_flags=STARTED|MQ_INFLIGHT|DONTPREP|ELVPRIV|IO_STAT|ELV, .state=in_flight, .tag=2056, .internal_tag=8, .cmd=Read(16) 88 00 00 00 00 00 00 1c 01 a8 00 00 00 08 00 00, .retries=0, .result = 0x0, .flags=TAGGED|INITIALIZED|LAST, .timeout=180.000, allocated 0.000 s ago} ./sdw/hctx0/busy:0000000087e9a58e {.op=WRITE, .cmd_flags=SYNC|FUA, .rq_flags=FLUSH_SEQ|MQ_INFLIGHT|DONTPREP|IO_STAT|ELV, .state=in_flight, .tag=2058, .internal_tag=102, .cmd=Write(16) 8a 08 00 00 00 01 d1 c0 b7 88 00 00 00 08 00 00, .retries=0, .result = 0x0, .flags=TAGGED|INITIALIZED|LAST, .timeout=180.000, allocated 0.000 s ago} ./sdaf/hctx0/busy:00000000d8751601 {.op=WRITE, .cmd_flags=SYNC|FUA, .rq_flags=FLUSH_SEQ|MQ_INFLIGHT|DONTPREP|IO_STAT|ELV, .state=in_flight, .tag=2057, .internal_tag=51, .cmd=Write(16) 8a 08 00 00 00 01 d1 c0 b7 88 00 00 00 08 00 00, .retries=0, .result = 0x0, .flags=TAGGED|INITIALIZED|LAST, .timeout=180.000, allocated 0.010 s ago}

All requests have the flag "ELV". So my follow-up questions are:
* Which I/O scheduler has been configured? If it is BFQ, please try whether
  mq-deadline or "none" work better.

crowsnest [00:34:58] /sys/class/block/sda/device/block/sda/queue # cat scheduler
none [mq-deadline] kyber bfq

crowsnest [00:35:31] /sys/class/block # for i in */device/block/sd*/queue/scheduler; do echo none > $i; done
crowsnest [00:35:45] /sys/class/block #

So let's see if that perhaps relates.  Neither CFQ nor BFQ has ever given me anywhere near the performance of deadline, so that's our default goto.

crowsnest [15:03:34] ~ # uptime
 15:07:52 up 15 days,  4:47,  3 users,  load average: 10.26, 9.88, 9.68

"how long is a piece of string?" comes to mind whether looking to decide if that's sufficiently long to call it success?  Normally died after about 7 days.

So *suspected* mq-deadline bug?

And various hints that newer firmware exists ... but broadcom is not making it easy to find the download nor is supermicro's website of much help ... will try again during more sane hours.

https://www.broadcom.com/support/download-search?pg=Storage+Adapters,+Controllers,+and+ICs&pf=Storage+Adapters,+Controllers,+and+ICs&pn=SAS3008+I/O+Controller&pa=Firmware&po=&dk=&pl=&l=false

Supermicro has responded with appropriate upgrade options which we've not executed yet, but I'll make time for that over the coming weekend, or perhaps I should wait a bit longer to give more time for a similar lockup with the none scheduler?

Kind Regards,
Jaco




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux