Re: Possible leak during reshaping layout

NeilBrown <neilb@xxxxxxx> · Mon, 21 Jul 2014 17:26:51 +1000

On Sat, 19 Jul 2014 22:27:00 -0700 Kenny Root <kenny@xxxxxxxxx> wrote:

> I may have stumbled into a kernel memory leak during reshaping of a RAID 10
> from offset to near layout:
> 
> I have a RAID 10 array which was previously in offset layout. I decided to
> reshape to a near layout. Eventually the machine had become very sluggish,
> the load average shot up, and the reshape slowed down to nearly nothing.
> 
>     md127 : active raid10 sdh1[2] sdk1[3] sdf1[0] sdg1[1]
>           7813771264 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
>           [=========>...........]  reshape = 49.5% (3872227840/7813771264) finish=63624.5min speed=1032K/sec
> 
> A look at slabtop appears to show that there is an allocation that is
> larger than the physical RAM (16GB):
> 
>      Active / Total Objects (% used)    : 61551490 / 61918456 (99.4%)
>      Active / Total Slabs (% used)      : 2209811 / 2209811 (100.0%)
>      Active / Total Caches (% used)     : 76 / 99 (76.8%)
>      Active / Total Size (% used)       : 15241504.92K / 15319798.41K (99.5%)
>      Minimum / Average / Maximum Object : 0.01K / 0.25K / 15.69K
> 
>       OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
>     60511744 60511219  29%    0.25K 2183366       32  17466928K kmalloc-256
>     193408  82391  42%    0.06K   3022       64     12088K kmalloc-64
>     154880 129949  83%    0.03K   1210      128      4840K kmalloc-32
>     154624 152783  98%    0.01K    302      512      1208K kmalloc-8
>     144160 143412  99%    0.02K    848      170      3392K fsnotify_event_holder
>     125103  34053  27%    0.08K   2453       51      9812K selinux_inode_security
> 

This very suspicious.
As you might imagine, it is not possible for a slab to use more memory than
is physically available.
It claims there are 60511219 active objects out of a total of 60511744.
I calculate that as 99.9999132%, but it suggests 29%.

If there were 32 OBJ/SLAB, then the slabs must be 8K.  This is possible, but
they are 4K on my machine, and all the other slabs you listed are too.

I've tried a similar reshape on 3.16-rc3 and there is no similar leak.

The only patch since 3.13 that could possibly be relevant is

commit cc13b1d1500656a20e41960668f3392dda9fa6e2
Author: NeilBrown <neilb@xxxxxxx>
Date:   Mon May 5 13:34:37 2014 +1000

    md/raid10: call wait_barrier() for each request submitted.

That might fix a leak.  However the leak it might fix was introduced in
3.14-rc1:
    commit 20d0189b1012a37d2533a87fb451f7852f2418d1
        block: Introduce new bio_split()

So unless Fedora backported one of those but not the other I don't see how
this can be caused by RAID10.

What does /proc/slabinfo contain?  Maybe "slabtop" is presenting it poorly.

NeilBrown

> Output of mdadm -D:
> 
> /dev/md127:
>         Version : 1.2
>   Creation Time : Wed Dec 20 19:41:25 2013
>      Raid Level : raid10
>      Array Size : 7813771264 (7451.79 GiB 8001.30 GB)
>   Used Dev Size : 3906885632 (3725.90 GiB 4000.65 GB)
>    Raid Devices : 4
>   Total Devices : 4
>     Persistence : Superblock is persistent
> 
>     Update Time : Sat Jul 19 22:20:55 2014
>           State : active, reshaping
>  Active Devices : 4
> Working Devices : 4
>  Failed Devices : 0
>   Spare Devices : 0
> 
>          Layout : offset=2
>      Chunk Size : 512K
> 
>  Reshape Status : 49% complete
>      New Layout : near=2, far=1
> 
>            Name : local:home  (local to host local)
>            UUID : 3102a888:f08888a8:da88e888:c6288888
>          Events : 70841
> 
>     Number   Major   Minor   RaidDevice State
>        0       8       81        0      active sync   /dev/sdf1
>        1       8       97        1      active sync   /dev/sdg1
>        2       8      113        2      active sync   /dev/sdh1
>        3       8      161        3      active sync   /dev/sdk1
> 
> uname -r output:
> 3.13.6-200.fc20.x86_64
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Attachment:
signature.asc

Description: PGP signature