Re: problem with recovered array

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 31/10/2023 14.21, Carlos Carvalho wrote:
Roger Heflin (rogerheflin@xxxxxxxxx) wrote on Mon, Oct 30, 2023 at 01:14:49PM -03:
look at  SAR -d output for all the disks in the raid6.   It may be a
disk issue (though I suspect not given the 100% cpu show in raid).

Clearly something very expensive/deadlockish is happening because of
the raid having to rebuild the data from the missing disk, not sure
what could be wrong with it.

This is very similar to what I complained some 3 months ago. For me it happens
with an array in normal state. sar shows no disk activity yet there are no
writes to the array (reads happen normally) and the flushd thread uses 100%
cpu.

For the latest 6.5.* I can reliably reproduce it with
% xzcat linux-6.5.tar.xz | tar x -f -

This leaves the machine with ~1.5GB of dirty pages (as reported by
/proc/meminfo) that it never manages to write to the array. I've waited for
several hours to no avail. After a reboot the kernel tree had only about 220MB
instead of ~1.5GB...

I rebooted the machine, so all is pristine.
This is F38, kernel 6.5.8-200.fc38.x86_64, with 32GB RAM.

I started a copy (SATA->rsync) into the array. Within seconds the kworked started running with 100%CPU.
In less that 1 minute is almost stopped(*1), after transferring about 5GB.
Below(*2) is the meminfo at that time.

10 minutes later it is till copying the same (32KB!) file.
20m later I stopped the copy bnut the kworker remained.
10m later I removed the target directory and the kworker was immediately gone.

I still suspect that after the array was 'all spares' and I re-assembled it, after when it looked good,
something is not completely right.

What other information should I provide to help resolve this issue?

TIA

(*1) the read side had zero activity and the write (the array) had a trikkle of about 50KB/s.

(*2) $ cat /proc/meminfo
MemTotal:       32704880 kB
MemFree:          654660 kB
MemAvailable:   29397832 kB
Buffers:          951396 kB
Cached:         26962316 kB
SwapCached:           32 kB
Active:          2698164 kB
Inactive:       26738212 kB
Active(anon):    1755572 kB
Inactive(anon):    51596 kB
Active(file):     942592 kB
Inactive(file): 26686616 kB
Unevictable:      177112 kB
Mlocked:               0 kB
SwapTotal:      16777212 kB
SwapFree:       16776444 kB
Zswap:                 0 kB
Zswapped:              0 kB
Dirty:           5069080 kB
Writeback:            28 kB
AnonPages:       1699812 kB
Mapped:           669464 kB
Shmem:            284500 kB
KReclaimable:    1452908 kB
Slab:            1731432 kB
SReclaimable:    1452908 kB
SUnreclaim:       278524 kB
KernelStack:       16160 kB
PageTables:        29476 kB
SecPageTables:         0 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    33129652 kB
Committed_AS:    8020984 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       58744 kB
VmallocChunk:          0 kB
Percpu:             5792 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
CmaTotal:              0 kB
CmaFree:               0 kB
Unaccepted:            0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:      338060 kB
DirectMap2M:     4757504 kB
DirectMap1G:    28311552 kB

--
Eyal at Home (eyal@xxxxxxxxxxxxxx)




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux