Re: Software RAID memory issue?

NeilBrown <neilb@xxxxxxxx> · Mon, 10 Dec 2018 15:32:23 +1100

On Wed, Dec 05 2018, Richard Alloway wrote:

>
> ================================================ 
> # egrep '^#|raid' /proc/slabinfo | sed 's/^#//' | column -t
> name       <active_objs>  <num_objs>  <objsize>  <objperslab>  <pagesperslab>  :  tunables  <limit>  <batchcount>  <sharedfactor>  :  slabdata  <active_slabs>  <num_slabs>  <sharedavail>
> raid6-md0  272            272         1864       17            8               :  tunables  0        0             0               :  slabdata  16              16           0
> ================================================
>  
> The array is empty – no filesystems, partitions, or anything, so the disks are idle.
>  
> If I trigger a raid-check manually, and then re-examine the slabinfo:
>
> ================================================ 
> # /usr/sbin/raid-check ; egrep '^#|raid' /proc/slabinfo | sed 's/^#//' | column -t
> name       <active_objs>  <num_objs>  <objsize>  <objperslab>  <pagesperslab>  :  tunables  <limit>  <batchcount>  <sharedfactor>  :  slabdata  <active_slabs>  <num_slabs>  <sharedavail>
> raid6-md0  3060           3060        1864       17            8               :  tunables  0        0             0               :  slabdata  180             180          0
> ================================================
>  
> Executing the raid-check a second time, the memory usage increases again:
>  
> ================================================ 
> # /usr/sbin/raid-check ; egrep '^#|raid' /proc/slabinfo | sed 's/^#//' | column -t name       <active_objs>  <num_objs>  <objsize>  <objperslab>  <pagesperslab>  :  tunables <limit>  <batchcount>  <sharedfactor>  :  slabdata  <active_slabs>  <num_slabs>  <sharedavail>
> raid6-md0  4420           4420        1864       17            8               :  tunables  0        0             0               :  slabdata  260             260          0
> ================================================
>  
> So, this accounts for the loss of available memory. 

This is useful information, thanks.

Can you repeat the experiment and also check the value in
  /sys/block/md0/md/stripe_cache_active

This number can grow large, but should shrink again when there is memory
pressure, but maybe that isn't happening.

If stripe_cache_active has a similar value to slabinfo, then memory
isn't getting lost, but the shrinker isn't working.
If it has a much smaller value then memory is getting lost.

If it appears to be the former, try to stop the check, then
  echo 3 > /proc/sys/vm/drop_caches

that should aggressively flush lots of caches, including the stripe
cache.

NeilBrown
Attachment:
signature.asc

Description: PGP signature