Re: Software RAID memory issue?

Richard Alloway <richard.alloway@xxxxxxxxxxxxx> · Mon, 10 Dec 2018 19:37:56 +0000

On 12/9/18, 23:32, "NeilBrown" <neilb@xxxxxxxx> wrote:

    This is useful information, thanks.

    Can you repeat the experiment and also check the value in
      /sys/block/md0/md/stripe_cache_active

Hi Neil!

Thanks for the response and the additional troubleshooting steps!

Here is the result of checking /sys/block/md0/md/stripe_cache_active before, during and after the consistency check (Tested against CentOS kernel 3.10.0-514.el7.x86_64, which is the first one that exhibits this behavior):

Before consistency check:
================================================
# cat /proc/mdstat ; echo ; egrep '^#|raid' /proc/slabinfo | sed 's/^#//' | column -t ; echo -e "\n/sys/block/md0/md/stripe_cache_active: $(cat /sys/block/md0/md/stripe_cache_active)"
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sde[3] sdd[2] sdb[0] sdc[1]
      104791040 blocks super 1.2 level 6, 512k chunk, algorithm 2 [4/4] [UUUU]

unused devices: <none>

name       <active_objs>  <num_objs>  <objsize>  <objperslab>  <pagesperslab>  :  tunables  <limit>  <batchcount>  <sharedfactor>  :  slabdata  <active_slabs>  <num_slabs>  <sharedavail>
raid6-md0  266            266         1696       19            8               :  tunables  0        0             0               :  slabdata  14              14           0

/sys/block/md0/md/stripe_cache_active: 0
================================================

During consistency check:
================================================ 
# cat /proc/mdstat ; echo ; egrep '^#|raid' /proc/slabinfo | sed 's/^#//' | column -t ; echo -e "\n/sys/block/md0/md/stripe_cache_active: $(cat /sys/block/md0/md/stripe_cache_active)"
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sde[3] sdd[2] sdb[0] sdc[1]
      104791040 blocks super 1.2 level 6, 512k chunk, algorithm 2 [4/4] [UUUU]
      [=>...................]  check =  6.3% (3311484/52395520) finish=19.7min speed=41438K/sec

unused devices: <none>

name       <active_objs>  <num_objs>  <objsize>  <objperslab>  <pagesperslab>  :  tunables  <limit>  <batchcount>  <sharedfactor>  :  slabdata  <active_slabs>  <num_slabs>  <sharedavail>
raid6-md0  1387           1387        1696       19            8               :  tunables  0        0             0               :  slabdata  73              73           0

/sys/block/md0/md/stripe_cache_active: 1320
================================================

After consistency check:
================================================ 
# cat /proc/mdstat ; echo ; egrep '^#|raid' /proc/slabinfo | sed 's/^#//' | column -t ; echo -e "\n/sys/block/md0/md/stripe_cache_active: $(cat /sys/block/md0/md/stripe_cache_active)"
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sde[3] sdd[2] sdb[0] sdc[1]
      104791040 blocks super 1.2 level 6, 512k chunk, algorithm 2 [4/4] [UUUU]

unused devices: <none>

name       <active_objs>  <num_objs>  <objsize>  <objperslab>  <pagesperslab>  :  tunables  <limit>  <batchcount>  <sharedfactor>  :  slabdata  <active_slabs>  <num_slabs>  <sharedavail>
raid6-md0  4522           4522        1696       19            8               :  tunables  0        0             0               :  slabdata  238             238          0

/sys/block/md0/md/stripe_cache_active: 0
================================================

    This number can grow large, but should shrink again when there is memory
    pressure, but maybe that isn't happening.

    If stripe_cache_active has a similar value to slabinfo, then memory
    isn't getting lost, but the shrinker isn't working.
    If it has a much smaller value then memory is getting lost.

Before and after the consistency check, the value is zero.  During the consistency check, it does grow, similarly to what is in slabinfo, but when it drops afterwards, the slabinfo remains high.

    If it appears to be the former, try to stop the check, then
      echo 3 > /proc/sys/vm/drop_caches

    that should aggressively flush lots of caches, including the stripe
    cache.

Even though stripe_cache_active dropped, I thought the output after dropping the caches may be helpful:

================================================
# cat /proc/mdstat ; echo ; egrep '^#|raid' /proc/slabinfo | sed 's/^#//' | column -t ; echo -e "\n/sys/block/md0/md/stripe_cache_active: $(cat /sys/block/md0/md/stripe_cache_active)"
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sde[3] sdd[2] sdb[0] sdc[1]
      104791040 blocks super 1.2 level 6, 512k chunk, algorithm 2 [4/4] [UUUU]

unused devices: <none>

name       <active_objs>  <num_objs>  <objsize>  <objperslab>  <pagesperslab>  :  tunables  <limit>  <batchcount>  <sharedfactor>  :  slabdata  <active_slabs>  <num_slabs>  <sharedavail>
raid6-md0  4522           4522        1696       19            8               :  tunables  0        0             0               :  slabdata  238             238          0

/sys/block/md0/md/stripe_cache_active: 0
# echo 3 > /proc/sys/vm/drop_caches
# cat /proc/mdstat ; echo ; egrep '^#|raid' /proc/slabinfo | sed 's/^#//' | column -t ; echo -e "\n/sys/block/md0/md/stripe_cache_active: $(cat /sys/block/md0/md/stripe_cache_active)"
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sde[3] sdd[2] sdb[0] sdc[1]
      104791040 blocks super 1.2 level 6, 512k chunk, algorithm 2 [4/4] [UUUU]

unused devices: <none>

name       <active_objs>  <num_objs>  <objsize>  <objperslab>  <pagesperslab>  :  tunables  <limit>  <batchcount>  <sharedfactor>  :  slabdata  <active_slabs>  <num_slabs>  <sharedavail>
raid6-md0  988            4446        1696       19            8               :  tunables  0        0             0               :  slabdata  234             234          0

/sys/block/md0/md/stripe_cache_active: 0
================================================

Dropping the caches did reduce the active objects considerably, but not so much the total number of objects.

Going back to the "this kernel has the issue, this kernel doesn't" investigation that I've been doing, the newest CentOS 7.2 kernel (3.10.0-327.36.3.el7) doesn't have this issue, but consistency checks take quite a bit longer, while the initial CentOS 7.3 kernel (3.10.0-514.el7) does.

A diff on the two kernels shows 160 changelog entries referencing 25 unique Red Hat Bugzilla tickets credited to Heinz Mauelshagen, Jes Sorensen and Mike Snitzer.  Trying to track down the changes for each is proving a bit difficult as the changes that Red Hat puts into their kernels can be backports of fixes from newer official/upstream kernels or fixes that have not yet been merged into the upstream kernel.

As luck would have it, Red Hat just updated their Bugzilla and I can no longer log in, so I can't even open a new issue until I get my access resolved.

I know that the Red Hat releases would likely need to be investigated by Red Hat themselves since they are the ones patching the kernels that they release, but the patch(es) that are responsible for this issue, regardless of where they came from, must have been merged with the official kernel at some point since the issue is present in the ELRepo 4.19.5-1.el7 kernel.  (The ELRepo kernels being builds of unpatched source from kernel.org.)

I guess I'll start testing vanilla kernels directly from kernel.org to find out which upstream kernel first exhibited this behavior.

Thanks again!

-Rich

    NeilBrown