Re: problem with recovered array

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The await being so high on md127, but normish on the underlying disks
would seem to point at some kernel issue inside md using raid6 where
one disk is out and the data has to be reconstructed.

On Fri, Nov 3, 2023 at 7:54 PM <eyal@xxxxxxxxxxxxxx> wrote:
>
> On 04/11/2023 09.38, eyal@xxxxxxxxxxxxxx wrote:
> > On 04/11/2023 02.57, Roger Heflin wrote:
> >> On Fri, Nov 3, 2023 at 9:17 AM Carlos Carvalho <carlos@xxxxxxxxxxxxxx> wrote:
> >>>
> >>> Johannes Truschnigg (johannes@xxxxxxxxxxxxxxx) wrote on Thu, Nov 02, 2023 at 05:34:51AM -03:
> >>>> for the record, I do not think that any of the observations the OP made can be
> >>>> explained by non-pathological phenomena/patterns of behavior. Something is
> >>>> very clearly wrong with how this system behaves (the reported figures do not
> >>>> at all match the expected performance of even a degraded RAID6 array in my
> >>>> experience) and how data written to the filesystem apparently fails to make it
> >>>> into the backing devices in acceptable time.
> >>>>
> >>>> The whole affair reeks either of "subtle kernel bug", or maybe "subtle
> >>>> hardware failure", I think.
> >>>
> >>> Exactly. That's what I've been saying for months...
> >>>
> >>> I found a clear comparison: expanding the kernel tarball in the SAME MACHINE
> >>> with 6.1.61 and 6.5.10. The raid6 array is working normally in both cases. With
> >>> 6.1.61 the expansion works fine, finishes with ~100MB of dirty pages and these
> >>> are quickly sent to permanent storage. With 6.5.* it finishes with ~1.5GB of
> >>> dirty pages that are never sent to disk (I waited ~3h). The disks are idle, as
> >>> shown by sar, and the kworker/flushd runs with 100% cpu usage forever.
> >>>
> >>> Limiting the dirty*bytes in /proc/sys/vm the dirty pages stay low BUT tar is
> >>> blocked in D state and the tarball expansion proceeds so slowly that it'd take
> >>> days to complete (checked with quota).
> >>>
> >>> So 6.5 (and 6.4) are unusable in this case. In another machine, which does
> >>> hundreds of rsync downloads every day, the same problem exists and I also get
> >>> frequent random rsync timeouts.
> >>>
> >>> This is all with raid6 and ext4. One of the machines has a journal disk in the
> >>> raid and the filesystem is mounted with nobarriers. Both show the same
> >>> behavior. It'd be interesting to try a different filesystem but these are
> >>> production machines with many disks and I cannot create another big array to
> >>> transfer the contents.
> >>
> >> My array is running 6.5 + xfs, and mine all seems to work normally
> >> (speed wise).  And in the perf top call he ran all of the busy
> >> kworkers were ext4* calls spending a lot of time doing various
> >> filesystem work.
> >>
> >> I did find/debug a situation where dumping the cache caused ext4
> >> performance to be a disaster (large directories, lots of files).  It
> >> was tracked back to ext4 relies on the Buffers:  data space in
> >> /proc/meminfo for at least directory entry caching, and that if there
> >> were a lot of directories and/or files in directories that Buffer:
> >> getting dropped and/or getting pruned for any some reason caused the
> >> fragmented directory entries to have to get reloaded from a spinning
> >> disk and require the disk to be seeking for  *MINUTES* to reload it
> >> (there were in this case several million files in a couple of
> >> directories with the directory entries being allocated over time so
> >> very likely heavily fragmented).
> >>
> >> I wonder if there was some change with how Buffers is
> >> used/sized/pruned in the recent kernels.   The same drop_cache on an
> >> XFS filesystem had no effect that I could identify and doing a ls -lR
> >> on a big xfs filesystem does not make Buffers grow, but doing the same
> >> ls -lR against an ext3/4 makes Buffers grow quite a bit (how much
> >> depends on how many files/directories are on the filesystem).
> >>
> >> He may want to monitor buffers (cat /proc/meminfo | grep Buffers:) and
> >> see if the poor performance correlates with Buffers suddenly being
> >> smaller for some reason.
> >
> > As much as I hate this, I started another small test.
> >
> > $ uname -a
> > Linux e7.eyal.emu.id.au 6.5.8-200.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Oct 20 15:53:48 UTC 2023 x86_64 GNU/Linux
> >
> > $ df -h /data1
> > Filesystem      Size  Used Avail Use% Mounted on
> > /dev/md127       55T   45T  9.8T  83% /data1
> >
> > $ sudo du -sm /data2/no-backup/old-backups/tapes/01
> > 2519    /data2/no-backup/old-backups/tapes/01
> >
> > $ sudo find /data2/no-backup/old-backups/tapes/01|wc -l
> > 92059
> >
> > $ sudo rsync -aHSK --stats --progress --checksum-choice=none --no-compress -W /data2/no-backup/old-backups/tapes/01 /data1/no-backup/old-backups/
> >
> > It completed in about one minute and it was enough to trigger the problem.
> >
> >      PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
> > 2781097 root      20   0       0      0      0 R  98.3   0.0  36:04.35 kworker/u16:0+flush-9:127
> >
> > iostat has nothing unusual, writes to md127 at 10-20KB/s and %util is in the low single digit for all members.
> >
> > Here is what meminfo showed in the first 10 minutes (it is still going but the trend is clear):
> >
> >                          Dirty   change   Buffers   change
> > 2023-11-04 08:56:35        44     -580   1170536        8
> > 2023-11-04 08:56:45        48        4   1170552       16
> > 2023-11-04 08:56:55    812456   812408   1171008      456
> > 2023-11-04 08:57:05    538436  -274020   1180820     9812
> > 2023-11-04 08:57:15    698708   160272   1189368     8548
> > 2023-11-04 08:57:25    874208   175500   1195620     6252
> > 2023-11-04 08:57:35    742300  -131908   1202124     6504
> > 2023-11-04 08:57:45    973528   231228   1209580     7456
> > 2023-11-04 08:57:55   1269320   295792   1214900     5320
> > 2023-11-04 08:58:05   1624428   355108   1219764     4864
> > 2023-11-04 08:58:15   1629484     5056   1219816       52
> > 2023-11-04 08:58:25   1629372     -112   1219832       16
> > 2023-11-04 08:58:35   1629028     -344   1219856       24
> > 2023-11-04 08:58:45   1628928     -100   1219880       24
> > 2023-11-04 08:58:55   1628552     -376   1219908       28
> > 2023-11-04 08:59:05   1629252      700   1220072      164
> > 2023-11-04 08:59:15   1628696     -556   1220132       60
> > 2023-11-04 08:59:25   1628304     -392   1220156       24
> > 2023-11-04 08:59:35   1628264      -40   1220188       32
> > 2023-11-04 08:59:45   1628184      -80   1220340      152
> > 2023-11-04 08:59:55   1628144      -40   1220364       24
> > 2023-11-04 09:00:05   1628124      -20   1219940     -424
> > 2023-11-04 09:00:15   1627908     -216   1219976       36
> > 2023-11-04 09:00:25   1627840      -68   1220000       24
> > 2023-11-04 09:00:35   1624276    -3564   1220024       24
> > 2023-11-04 09:00:45   1624100     -176   1220060       36
> > 2023-11-04 09:00:55   1623912     -188   1220092       32
> > 2023-11-04 09:01:05   1624076      164   1220112       20
> > 2023-11-04 09:01:15   1623368     -708   1220160       48
> > 2023-11-04 09:01:25   1623176     -192   1220196       36
> > 2023-11-04 09:01:35   1621872    -1304   1220232       36
> > 2023-11-04 09:01:45   1621732     -140   1220308       76
> > 2023-11-04 09:01:55   1612304    -9428   1220392       84
> > 2023-11-04 09:02:05   1612256      -48   1220420       28
> > 2023-11-04 09:02:15   1612040     -216   1220444       24
> > 2023-11-04 09:02:25   1611968      -72   1220476       32
> > 2023-11-04 09:02:35   1611872      -96   1220492       16
> > 2023-11-04 09:02:45   1609932    -1940   1220524       32
> > 2023-11-04 09:02:55   1609828     -104   1220556       32
> > 2023-11-04 09:03:05   1609916       88   1220572       16
> > 2023-11-04 09:03:15   1609496     -420   1220608       36
> > 2023-11-04 09:03:25   1609392     -104   1220632       24
> > 2023-11-04 09:03:35   1609320      -72   1220648       16
> > 2023-11-04 09:03:45   1609240      -80   1220672       24
> > 2023-11-04 09:03:55   1609152      -88   1220688       16
> > 2023-11-04 09:04:05   1609332      180   1220712       24
> > 2023-11-04 09:04:15   1608892     -440   1220748       36
> > 2023-11-04 09:04:25   1608848      -44   1220764       16
> > 2023-11-04 09:04:35   1608744     -104   1220796       32
> > 2023-11-04 09:04:45   1608436     -308   1220820       24
> > 2023-11-04 09:04:55   1607916     -520   1220836       16
> > 2023-11-04 09:05:05   1608624      708   1220876       40
> > 2023-11-04 09:05:15   1606556    -2068   1220928       52
> > 2023-11-04 09:05:25   1602692    -3864   1221016       88
> > 2023-11-04 09:05:35   1602080     -612   1221052       36
> > 2023-11-04 09:05:45   1602000      -80   1221080       28
> > 2023-11-04 09:05:55   1601928      -72   1221096       16
> > 2023-11-04 09:06:05   1602228      300   1221124       28
> > 2023-11-04 09:06:15   1601848     -380   1221156       32
> > 2023-11-04 09:06:25   1601656     -192   1221180       24
> > 2023-11-04 09:06:35   1601532     -124   1221212       32
> > 2023-11-04 09:06:45   1601476      -56   1221228       16
> > 2023-11-04 09:06:55   1601364     -112   1221252       24
>
> Another view of interest, is the fact that w_await, which I understand is the total time to service a write request,
> is high, at 1-2s from this point onward. The components are significantly quicker at the low teens.
>
>           Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
> 09:00:00 md127            0.02      0.07     0.00   0.00   18.00     4.00    2.33     13.27     0.00   0.00 1268.36     5.69    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    2.96   1.12
> 09:00:00 sdb              0.10      3.80     0.85  89.47    2.50    38.00    0.85      1.57     0.00   0.00    3.00     1.84    0.00      0.00     0.00   0.00    0.00     0.00    0.83    2.82    0.01   0.35
> 09:00:00 sdc              0.13      3.93     0.85  86.44    6.00    29.50    1.00      4.17     0.50  33.33    4.38     4.17    0.00      0.00     0.00   0.00    0.00     0.00    0.83    4.72    0.01   0.53
> 09:00:00 sdd              0.13      3.93     0.85  86.44    7.12    29.50    1.05      3.70     0.33  24.10   13.78     3.52    0.00      0.00     0.00   0.00    0.00     0.00    0.83    2.90    0.02   1.54
> 09:00:00 sde              0.12      3.87     0.85  87.93    2.29    33.14    0.83      1.50     0.00   0.00   14.02     1.80    0.00      0.00     0.00   0.00    0.00     0.00    0.83    0.42    0.01   1.27
> 09:00:00 sdf              0.03      0.13     0.00   0.00   12.50     4.00    1.37     14.63     2.75  66.80   13.83    10.71    0.00      0.00     0.00   0.00    0.00     0.00    0.83    4.88    0.02   1.84
> 09:00:00 sdg              0.02      0.07     0.00   0.00   21.00     4.00    1.40     14.77     2.75  66.27   13.45    10.55    0.00      0.00     0.00   0.00    0.00     0.00    0.83    4.96    0.02   1.77
> 09:01:00 md127            0.05      0.20     0.00   0.00   26.00     4.00    7.85     83.00     0.00   0.00 2388.76    10.57    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00   18.75   1.74
> 09:01:00 sdb              0.23     15.73     3.70  94.07   20.36    67.43    0.98     29.12     6.68  87.17    5.75    29.62    0.00      0.00     0.00   0.00    0.00     0.00    0.67    7.75    0.02   0.73
> 09:01:00 sdc              0.20     10.33     2.38  92.26   12.08    51.67    0.77      9.93     2.10  73.26    3.20    12.95    0.00      0.00     0.00   0.00    0.00     0.00    0.67    3.33    0.01   0.36
> 09:01:00 sdd              0.23     18.60     4.42  94.98   23.07    79.71    1.03     22.38     4.93  82.68   11.61    21.66    0.00      0.00     0.00   0.00    0.00     0.00    0.67    3.62    0.02   1.31
> 09:01:00 sde              0.22     18.20     4.33  95.24   19.62    84.00    0.98     26.18     5.93  85.78   11.61    26.63    0.00      0.00     0.00   0.00    0.00     0.00    0.67    3.40    0.02   1.35
> 09:01:00 sdf              0.18     18.67     4.48  96.07   42.64   101.82    1.47     49.72    11.33  88.54   12.03    33.90    0.00      0.00     0.00   0.00    0.00     0.00    0.67    7.67    0.03   1.62
> 09:01:00 sdg              0.18     19.27     4.63  96.19   39.00   105.09    1.48     46.98    10.63  87.76   12.37    31.67    0.00      0.00     0.00   0.00    0.00     0.00    0.67    7.35    0.03   1.68
> 09:02:00 md127            0.32      1.27     0.00   0.00   26.95     4.00   14.12    214.93     0.00   0.00  639.34    15.23    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    9.03   2.80
> 09:02:00 sdb              0.48     51.47    12.38  96.24   18.03   106.48    1.20     63.79    15.13  92.65    3.72    53.16    0.00      0.00     0.00   0.00    0.00     0.00    0.78    3.79    0.02   0.79
> 09:02:00 sdc              0.53     58.07    13.98  96.33   16.16   108.88    1.30     71.99    17.08  92.93    4.95    55.38    0.00      0.00     0.00   0.00    0.00     0.00    0.78    7.15    0.02   1.08
> 09:02:00 sdd              0.52     69.13    16.77  97.01   14.13   133.81    1.52     92.27    21.95  93.54   10.38    60.84    0.00      0.00     0.00   0.00    0.00     0.00    0.78    3.60    0.03   1.78
> 09:02:00 sde              0.50     64.60    15.65  96.90   16.83   129.20    1.87     96.40    22.63  92.38   11.50    51.64    0.00      0.00     0.00   0.00    0.00     0.00    0.78    6.96    0.04   2.20
> 09:02:00 sdf              0.57     74.80    18.13  96.97   15.65   132.00    1.77     77.60    18.03  91.08   11.51    43.92    0.00      0.00     0.00   0.00    0.00     0.00    0.78    7.23    0.03   2.25
> 09:02:00 sdg              0.68     64.13    15.35  95.74    9.49    93.85    1.72     68.73    15.87  90.24   11.74    40.04    0.00      0.00     0.00   0.00    0.00     0.00    0.78    7.43    0.03   2.26
> 09:03:00 md127            0.00      0.00     0.00   0.00    0.00     0.00    6.97     51.00     0.00   0.00 7200.34     7.32    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00   50.16   1.38
> 09:03:00 sdb              0.15      6.53     1.48  90.82   11.78    43.56    0.78      9.87     2.08  72.67    4.40    12.60    0.00      0.00     0.00   0.00    0.00     0.00    0.73    3.09    0.01   0.36
> 09:03:00 sdc              0.15      6.53     1.48  90.82    9.44    43.56    0.80      9.93     2.08  72.25    4.90    12.42    0.00      0.00     0.00   0.00    0.00     0.00    0.73    3.27    0.01   0.36
> 09:03:00 sdd              0.23     12.87     2.98  92.75    3.57    55.14    1.02     22.80     5.08  83.33   12.15    22.43    0.00      0.00     0.00   0.00    0.00     0.00    0.73    2.18    0.01   1.17
> 09:03:00 sde              0.45     17.73     3.98  89.85    1.52    39.41    1.37     29.27     6.35  82.29   13.65    21.41    0.00      0.00     0.00   0.00    0.00     0.00    0.73    5.34    0.02   1.60
> 09:03:00 sdf              0.37     11.40     2.48  87.13    5.59    31.09    1.13     16.33     3.35  74.72   13.82    14.41    0.00      0.00     0.00   0.00    0.00     0.00    0.73    3.98    0.02   1.60
> 09:03:00 sdg              0.47     21.67     4.95  91.38    3.11    46.43    1.17     16.60     3.38  74.36   14.50    14.23    0.00      0.00     0.00   0.00    0.00     0.00    0.73    5.23    0.02   1.58
> 09:04:00 md127            0.00      0.00     0.00   0.00    0.00     0.00    1.88     11.93     0.00   0.00 1439.04     6.34    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    2.71   0.92
> 09:04:00 sdb              0.38      8.73     1.80  82.44    2.00    22.78    0.90      4.70     0.72  44.33    3.94     5.22    0.00      0.00     0.00   0.00    0.00     0.00    0.82    4.02    0.01   0.48
> 09:04:00 sdc              0.38      8.73     1.80  82.44    1.74    22.78    0.82      1.50     0.00   0.00    2.69     1.84    0.00      0.00     0.00   0.00    0.00     0.00    0.82    2.47    0.00   0.34
> 09:04:00 sdd              0.30      3.47     0.57  65.38    0.39    11.56    0.98      9.97     1.95  66.48   13.14    10.14    0.00      0.00     0.00   0.00    0.00     0.00    0.82    2.14    0.01   1.40
> 09:04:00 sde              0.00      0.00     0.00   0.00    0.00     0.00    1.28     13.43     2.52  66.23   13.51    10.47    0.00      0.00     0.00   0.00    0.00     0.00    0.82    4.53    0.02   1.73
> 09:04:00 sdf              0.08      5.27     1.23  93.67    3.40    63.20    1.12      4.97     0.57  33.66   14.93     4.45    0.00      0.00     0.00   0.00    0.00     0.00    0.82    3.55    0.02   1.74
> 09:04:00 sdg              0.00      0.00     0.00   0.00    0.00     0.00    0.82      1.50     0.00   0.00   12.96     1.84    0.00      0.00     0.00   0.00    0.00     0.00    0.82    0.37    0.01   1.12
> 09:05:00 md127            0.00      0.00     0.00   0.00    0.00     0.00    2.27     19.27     0.00   0.00 1670.65     8.50    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    3.79   1.40
> 09:05:00 sdb              0.32      4.27     0.75  70.31    0.42    13.47    0.85      8.00     1.55  64.58    3.61     9.41    0.00      0.00     0.00   0.00    0.00     0.00    0.70    3.98    0.01   0.40
> 09:05:00 sdc              0.42     13.33     2.92  87.50    4.12    32.00    0.87     14.27     3.10  78.15    3.88    16.46    0.00      0.00     0.00   0.00    0.00     0.00    0.70    4.33    0.01   0.52
> 09:05:00 sdd              0.35      5.67     1.07  75.29    0.76    16.19    0.87     11.73     2.47  74.00   12.62    13.54    0.00      0.00     0.00   0.00    0.00     0.00    0.70    2.86    0.01   1.19
> 09:05:00 sde              0.00      0.00     0.00   0.00    0.00     0.00    1.23     17.47     3.53  74.13   12.08    14.16    0.00      0.00     0.00   0.00    0.00     0.00    0.70    6.19    0.02   1.51
> 09:05:00 sdf              0.08      5.53     1.30  93.98   11.40    66.40    1.15     13.47     2.62  69.47   13.26    11.71    0.00      0.00     0.00   0.00    0.00     0.00    0.70    4.21    0.02   1.55
> 09:05:00 sdg              0.08      5.53     1.30  93.98   12.00    66.40    0.80      9.07     1.87  70.00   12.04    11.33    0.00      0.00     0.00   0.00    0.00     0.00    0.70    1.24    0.01   1.03
> 09:06:00 md127            0.20      0.80     0.00   0.00   23.17     4.00    3.03    113.80     0.00   0.00 1063.22    37.52    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    3.23   2.60
> 09:06:00 sdb              0.30     32.80     7.90  96.34   10.56   109.33    1.22     36.85     8.37  87.30    4.70    30.29    0.00      0.00     0.00   0.00    0.00     0.00    0.72    6.56    0.01   0.76
> 09:06:00 sdc              0.30     37.40     9.05  96.79    8.83   124.67    1.13     42.32     9.82  89.65    4.34    37.34    0.00      0.00     0.00   0.00    0.00     0.00    0.72    4.42    0.01   0.63
> 09:06:00 sdd              0.58     62.60    15.07  96.27   16.83   107.31    1.43     57.72    13.37  90.32   11.91    40.27    0.00      0.00     0.00   0.00    0.00     0.00    0.72    6.74    0.03   1.79
> 09:06:00 sde              0.47     64.73    15.72  97.12   10.14   138.71    1.48     45.18    10.18  87.29   11.97    30.46    0.00      0.00     0.00   0.00    0.00     0.00    0.72    7.09    0.03   1.78
> 09:06:00 sdf              0.35     42.53    10.28  96.71   15.48   121.52    1.48     46.72    10.57  87.69   14.58    31.49    0.00      0.00     0.00   0.00    0.00     0.00    0.72    7.40    0.03   1.97
> 09:06:00 sdg              0.27     31.93     7.72  96.66   16.38   119.75    1.18     52.58    12.33  91.25   11.63    44.44    0.00      0.00     0.00   0.00    0.00     0.00    0.72    3.70    0.02   1.40
> 09:07:00 md127            0.00      0.00     0.00   0.00    0.00     0.00    2.10     13.93     0.00   0.00 1405.96     6.63    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    2.95   1.00
> 09:07:00 sdb              0.00      0.00     0.00   0.00    0.00     0.00    0.97      3.36     0.32  24.68    3.19     3.47    0.00      0.00     0.00   0.00    0.00     0.00    0.80    3.54    0.01   0.39
> 09:07:00 sdc              0.15      7.47     1.72  91.96    0.44    49.78    1.17     13.43     2.63  69.30    4.29    11.51    0.00      0.00     0.00   0.00    0.00     0.00    0.80    4.31    0.01   0.48
> 09:07:00 sdd              0.15      7.47     1.72  91.96    0.44    49.78    1.00     11.43     2.30  69.70   11.55    11.43    0.00      0.00     0.00   0.00    0.00     0.00    0.80    1.00    0.01   1.23
> 09:07:00 sde              0.15      7.47     1.72  91.96    9.44    49.78    1.28     12.83     2.37  64.84   14.22     9.99    0.00      0.00     0.00   0.00    0.00     0.00    0.80    4.27    0.02   1.69
> 09:07:00 sdf              0.05      2.53     0.58  92.11    4.67    50.67    1.18      7.89     1.23  51.03   13.87     6.67    0.00      0.00     0.00   0.00    0.00     0.00    0.80    4.29    0.02   1.63
> 09:07:00 sdg              0.00      0.00     0.00   0.00    0.00     0.00    0.80      1.43     0.00   0.00   13.48     1.78    0.00      0.00     0.00   0.00    0.00     0.00    0.80    1.08    0.01   1.14
>
> --
> Eyal at Home (eyal@xxxxxxxxxxxxxx)
>




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux