Re: kworker consumes 100% CPU on degraded RAID6

Eyal@xxxxxxxxxxxxxxxxx, Lebedinsky@xxxxxxxxxxxxxxxxx · Sat, 28 Oct 2023 20:11:23 +1100

On 28/10/2023 08.58, Eyal Lebedinsky wrote:
Fully updated F28.

I had to send one (of 7) member disk for RMA.
I notice that the system is very non responsive. 'top' shows

     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
1365697 root      20   0       0      0      0 R  93.8   0.0 384:40.55 kworker/u16:3+flush-9:127

This continues even when there are no user actions (ff, tb closed).

A few days ago it stopped, but today I see that it kept running all night where there were
period of inactivity for a few hours.

As another point: a few days ago I received a disk from RMA and the recovery went as fast as expected.
I then removed another disk to send for RMA.

Is this expected? Is there anything I can do to improve the situation?

TIA

Maybe a hint. On a whim I decided to look at interrupts on the machine. I see an item in
	/proc/interrupts
that grows by 80-90 every second.
It is listed as 'IR-PCI-MSIX-0000:03:00.0    0-edge      mpt2sas0-msix0' which is probably related
to the raid card used for this array.

Another hint: I see a job stuck in D state.

$ ps aux|grep parted
root     2398175  0.0  0.0   6184  3700 ?        D    05:10   0:00 parted -l

This command runs overnight to collect some stats, and it seems that this program is hanging.
This one started at "2023-10-27 05:10:01", so when the disk was still in the machine (not in the array)
but after it just finished being zeroed.

--
Eyal at Home (eyal@xxxxxxxxxxxxxx)
_______________________________________________
users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/users@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue