Re: bcache and memory at boot. Bug?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi again,


I have been "walking" around the code in order to debug.

Now, with kernels 3.13.3, 3.13.2 and 3.12.9 I can boot /dev/bcache0 as
the root system with all RAM available (32 GiB) with a small
modification (one line) in super.c into flash_devs_run().

To recapitulate, my system has sda partitioned with fdisk; sdb is GPT
partitioned. Cache device is /dev/sda. Backed device is another
partition in /dev/sdb. I am using a proportionally big partition for
the cache device (211551921 blocks) and perhaps some code would have
failed with these numbers.

What have I found?
Cache device has 4096 nr_uuids and some of them have an incorrect
value of "sectors". Incorrect values all of them have a value higher
than 0x9000000000000000, so I applied a "filter" in order not to run
the uuids (that avoids the try to allocate such huge quantity of RAM).

Some sample lines from /var/log/messages:
----------------------------------------
(...)
Feb 26 14:10:08 minijep kernel: [    3.589559] bcache:
flash_devs_run() P00: flash_dev_run:  uuids[875] u: ffff88080129b580,
u->sectors: 9000000002800000
Feb 26 14:10:08 minijep kernel: [    3.589561] bcache:
flash_devs_run() P00: flash_dev_run:  uuids[876] u: ffff88080129b600,
u->sectors: 16112800
Feb 26 14:10:08 minijep kernel: [    3.589562] bcache:
flash_devs_run() P00: flash_dev_run:  uuids[877] u: ffff88080129b680,
u->sectors: 41ab21000
Feb 26 14:10:08 minijep kernel: [    3.589563] bcache:
flash_devs_run() P00: flash_dev_run:  uuids[878] u: ffff88080129b700,
u->sectors: 9000000000800000
Feb 26 14:10:08 minijep kernel: [    3.589565] bcache:
flash_devs_run() P00: flash_dev_run:  uuids[879] u: ffff88080129b780,
u->sectors: 16112a40
Feb 26 14:10:08 minijep kernel: [    3.589566] bcache:
flash_devs_run() P00: flash_dev_run:  uuids[880] u: ffff88080129b800,
u->sectors: 41f8dd800
Feb 26 14:10:08 minijep kernel: [    3.589567] bcache:
flash_devs_run() P00: flash_dev_run:  uuids[881] u: ffff88080129b880,
u->sectors: 9000000002000000
Feb 26 14:10:08 minijep kernel: [    3.589569] bcache:
flash_devs_run() P00: flash_dev_run:  uuids[882] u: ffff88080129b900,
u->sectors: 16112c58
Feb 26 14:10:08 minijep kernel: [    3.589570] bcache:
flash_devs_run() P00: flash_dev_run:  uuids[883] u: ffff88080129b980,
u->sectors: 42ab53800
Feb 26 14:10:08 minijep kernel: [    3.589572] bcache:
flash_devs_run() P00: flash_dev_run:  uuids[884] u: ffff88080129ba00,
u->sectors: 9000000000800000
Feb 26 14:10:08 minijep kernel: [    3.589573] bcache:
flash_devs_run() P00: flash_dev_run:  uuids[885] u: ffff88080129ba80,
u->sectors: 16112eb8
Feb 26 14:10:08 minijep kernel: [    3.589574] bcache:
flash_devs_run() P00: flash_dev_run:  uuids[886] u: ffff88080129bb00,
u->sectors: 431732000
Feb 26 14:10:08 minijep kernel: [    3.589576] bcache:
flash_devs_run() P00: flash_dev_run:  uuids[887] u: ffff88080129bb80,
u->sectors: 9000000000800000
Feb 26 14:10:08 minijep kernel: [    3.589577] bcache:
flash_devs_run() P00: flash_dev_run:  uuids[888] u: ffff88080129bc00,
u->sectors: 16118250
Feb 26 14:10:08 minijep kernel: [    3.589578] bcache:
flash_devs_run() P00: flash_dev_run:  uuids[889] u: ffff88080129bc80,
u->sectors: c014a000
Feb 26 14:10:08 minijep kernel: [    3.589580] bcache:
flash_devs_run() P00: flash_dev_run:  uuids[890] u: ffff88080129bd00,
u->sectors: 9000000000800000
Feb 26 14:10:08 minijep kernel: [    3.589581] bcache:
flash_devs_run() P00: flash_dev_run:  uuids[891] u: ffff88080129bd80,
u->sectors: 24bb8960
Feb 26 14:10:08 minijep kernel: [    3.589582] bcache:
flash_devs_run() P00: flash_dev_run:  uuids[892] u: ffff88080129be00,
u->sectors: 162c74800
Feb 26 14:10:08 minijep kernel: [    3.589584] bcache:
flash_devs_run() P00: flash_dev_run:  uuids[893] u: ffff88080129be80,
u->sectors: 9000000001800000
Feb 26 14:10:08 minijep kernel: [    3.589585] bcache:
flash_devs_run() P00: flash_dev_run:  uuids[894] u: ffff88080129bf00,
u->sectors: 24bb8a00
Feb 26 14:10:08 minijep kernel: [    3.589586] bcache:
flash_devs_run() P00: flash_dev_run:  uuids[895] u: ffff88080129bf80,
u->sectors: 162c8e000
(...)
Feb 26 14:10:08 minijep kernel: [    3.593789] bcache:
flash_devs_run() P00: flash_dev_run:  uuids[4093] u: ffff8808012ffe80,
u->sectors: 0
Feb 26 14:10:08 minijep kernel: [    3.593791] bcache:
flash_devs_run() P00: flash_dev_run:  uuids[4094] u: ffff8808012fff00,
u->sectors: 0
Feb 26 14:10:08 minijep kernel: [    3.593792] bcache:
flash_devs_run() P00: flash_dev_run:  uuids[4095] u: ffff8808012fff80,
u->sectors: 0
Feb 26 14:10:08 minijep kernel: [    3.593794] bcache:
register_cache() registered cache device sda1
(...)
Feb 26 14:10:08 minijep kernel: [    5.781599] [drm] Enabling RC6
states: RC6 on, RC6p off, RC6pp off
Feb 26 14:10:08 minijep kernel: [   35.627315] Adding 32225276k swap
on /dev/sdb6.  Priority:-1 extents:1 across:32225276k
(...)
-----------------------------------------


The change to super.c (pr_info() call is for logging and next if is my
salvation ;-):


static int flash_devs_run(struct cache_set *c)
{
    int ret = 0;
    struct uuid_entry *u;
_debug_show_uuids_of_cache_set(9999,c);
    for (u = c->uuids;
         u < c->uuids + c->nr_uuids && !ret;
         u++)
         {
        pr_info("P00: flash_dev_run:  uuids[%d] u: %p, u->sectors:
%llx", u-c->uuids, u, u->sectors);
             if (u->sectors < 0x9000000000000000)
        if (UUID_FLASH_ONLY(u))
            ret = flash_dev_run(c, u);
        }


    return ret;
}


In order to log that large number of lines I boot with kernel
parameter: log_buf_len=10M

I would like to add that booting the system spends (I think that
waiting for something) some time to boot, but it could be another
thing (look at last lines of previous /var/log/messages).

As you have seen this is a "trick" to bypass something wich is going
wrong. My question is:

- where is "sectors" field of each uuids struct filled? Is the value
read from the cache device?

I guess that something could happen (or have happened) in some
conversion (int/long instead of unsigned long) or in addressing to
memory and a pointer which modified something which it should not.

I would like to know what happens in order to avoid the use of the
filtering line for big numbers in sectors field as I guess that
something else should be verified...

Josep

On 26 February 2014 17:03, Josep Lladonosa <jlladono@xxxxxxxxx> wrote:
> Hello,
>
> I forward some information wich was missing in the list. After this I
> write another message with some (good news for me and) information
> about things that I have found.
>
> Josep
>
> ---------- Forwarded message ----------
> From: Josep Lladonosa <jlladono@xxxxxxxxx>
> Date: 11 February 2014 17:39
> Subject: Re: bcache and memory at boot. Bug?
> To: matthew patton <pattonme@xxxxxxxxx>
>
>
> On 11 February 2014 12:23, matthew patton <pattonme@xxxxxxxxx> wrote:
>>
>>
>>  >>>  -Free memory with kernel 3.11.0-rc7 with 32GiB and bcache0 (halved?)
>>>>>  and bcache1 (mounted)
>>>>>
>>>>>  $ head -2 /proc/meminfo
>>>>>  MemTotal:       32913872 kB
>>>>>  MemFree:        12376640 kB
>>>>>
>>>>>
>>>>>  - Free memory with kernel 3.12.9 with mem=16G and bcache0 (mounted)
>>>>>
>>>>>  $ head -2 /proc/meminfo
>>>>>  MemTotal:       15602144 kB
>>>>>  MemFree:        14005372 kB
>>
>>
>> too many variables. Boot the same box with the SAME kernel, just twiddle how much ram is visible via the GRUB commandline. ... try with mem=16G and other values like 12G, 24G or 31G
>
> O.K. Here you are:
>
> With kernel 3.13.2:
>
> mem=2G  ---> boots O.K. (/dev/bcache0).
> mem=4G  ---> Out of memory. (initramfs appears with bcache0 and
> bcache1 for only one caching device. bcache1 has the two correct
> slaves).
> mem=8G  ---> bcache-register killed. Kernel panic.
> mem=16G --> Out of memory (initramfs appears with bcache0 to bcache7
> for only one caching device. bcache7 has the two correct slaves).
> mem=20G --> Out of memory (no initramfs prompt appearing).
> mem=24G --> Out of memory (initramfs appears with bcache0 to bcache9
> for only one caching device. bcache9 has the two correct slaves).
>
> (the more RAM is given to box, the more messages about out of memory you get.
>
> Kernel 3.12.9
>
> Command line: BOOT_IMAGE=/vmlinuz-3.12.9 root=/dev/bcache0 ro
> "acpi_osi=!Windows 2012" quiet mem=16G intel_pstate=disable
> pcie_aspm=force i915.i915_enable_rc6=1 i915.i915_enable_fbc=1
> i915.lvds_downclock=1 rcutree.rcu_idle_gp_delay=1
>
> (changing mem kernel boot parameter):
>
> mem=12G  ---> boots O.K. (/dev/bcache0).
> mem=14G  ---> boots O.K. (/dev/bcache0).
> mem=16G  ---> boots O.K. (/dev/bcache0).
> mem=17G  ---> boots O.K. (/dev/bcache0).
> mem=18G  ---> boots O.K. (/dev/bcache0).
> mem=20G  ---> Kernel panic.
> mem=24G  ---> Kernel panic.
> mem=32G  ---> Out of memory. (initramfs appears with bcache0 and
> bcache1 for only one caching device. bcache1 has the two correct
> slaves).
>
>
> With kernel 3.12.9, When booting, with udev info in /var/log/messages,
> there are memory allocation failure for bcache-register messages:
>
>
> Feb 11 17:22:10 minijep kernel: [    3.099025] bio: create slab <bio-1> at 1
> Feb 11 17:22:10 minijep kernel: [    3.452716] bcache:
> bch_journal_replay() journal replay done, 575 keys in 49 entries, seq
> 573572
> Feb 11 17:22:10 minijep kernel: [    3.452777] vmalloc: allocation
> failure: 19327352836 bytes
> Feb 11 17:22:10 minijep kernel: [    3.452780] bcache-register: page
> allocation failure: order:0, mode:0x80d2
> Feb 11 17:22:10 minijep kernel: [    3.452783] CPU: 0 PID: 298 Comm:
> bcache-register Not tainted 3.12.9 #1
> Feb 11 17:22:10 minijep kernel: [    3.452784] Hardware name:
> Micro-Star International Co., Ltd. GT60/MS-16F3, BIOS E16F3IMS.10U
> 01/21/2013
> Feb 11 17:22:10 minijep kernel: [    3.452786]  0000f72000000083
> ffffffff8137e021 0000000000000001 ffffffff810c0864
> Feb 11 17:22:10 minijep kernel: [    3.452789]  ffffffff814fdf94
> ffff880036f65b70 0000000000000018 ffff880036f65be8
> Feb 11 17:22:10 minijep kernel: [    3.452792]  ffff880036f65b88
> ffff880036fc8b20 0000000000000000 0000000000000003
> Feb 11 17:22:10 minijep kernel: [    3.452795] Call Trace:
> Feb 11 17:22:10 minijep kernel: [    3.452804]  [<ffffffff8137e021>] ?
> dump_stack+0x41/0x51
> Feb 11 17:22:10 minijep kernel: [    3.452808]  [<ffffffff810c0864>] ?
> warn_alloc_failed+0x118/0x12c
> Feb 11 17:22:10 minijep kernel: [    3.452814]  [<ffffffff810e7134>] ?
> __vmalloc_node_range+0x1cd/0x1de
> Feb 11 17:22:10 minijep kernel: [    3.452817]  [<ffffffff810e7171>] ?
> __vmalloc_node+0x2c/0x31
> Feb 11 17:22:10 minijep kernel: [    3.452827]  [<ffffffffa01ebd1c>] ?
> bcache_device_init+0x81/0x241 [bcache]
> Feb 11 17:22:10 minijep kernel: [    3.452832]  [<ffffffff810e71ab>] ?
> vzalloc+0x21/0x22
> Feb 11 17:22:10 minijep kernel: [    3.452839]  [<ffffffffa01ebd1c>] ?
> bcache_device_init+0x81/0x241 [bcache]
> Feb 11 17:22:10 minijep kernel: [    3.452845]  [<ffffffffa01ec05e>] ?
> flash_dev_run+0x7d/0x12b [bcache]
> Feb 11 17:22:10 minijep kernel: [    3.452854]  [<ffffffffa01ed88b>] ?
> run_cache_set+0x5c4/0x695 [bcache]
> Feb 11 17:22:10 minijep kernel: [    3.452859]  [<ffffffffa01eee47>] ?
> register_bcache+0x10a0/0x12d9 [bcache]
> Feb 11 17:22:10 minijep kernel: [    3.452863]  [<ffffffff810f0f15>] ?
> alloc_pages_current+0xef/0x109
> Feb 11 17:22:10 minijep kernel: [    3.452867]  [<ffffffff81159855>] ?
> sysfs_write_file+0xcf/0x103
> Feb 11 17:22:10 minijep kernel: [    3.452871]  [<ffffffff81106a23>] ?
> vfs_write+0xa8/0x10b
> Feb 11 17:22:10 minijep kernel: [    3.452874]  [<ffffffff8110705e>] ?
> SyS_write+0x41/0x74
> Feb 11 17:22:10 minijep kernel: [    3.452878]  [<ffffffff81386ca2>] ?
> system_call_fastpath+0x16/0x1b
> Feb 11 17:22:10 minijep kernel: [    3.452880] Mem-Info:
> Feb 11 17:22:10 minijep kernel: [    3.452881] Node 0 DMA per-cpu:
> Feb 11 17:22:10 minijep kernel: [    3.452883] CPU    0: hi:    0,
> btch:   1 usd:   0
> Feb 11 17:22:10 minijep kernel: [    3.452884] CPU    1: hi:    0,
> btch:   1 usd:   0
> Feb 11 17:22:10 minijep kernel: [    3.452886] CPU    2: hi:    0,
> btch:   1 usd:   0
> Feb 11 17:22:10 minijep kernel: [    3.452887] CPU    3: hi:    0,
> btch:   1 usd:   0
> Feb 11 17:22:10 minijep kernel: [    3.452888] CPU    4: hi:    0,
> btch:   1 usd:   0
> Feb 11 17:22:10 minijep kernel: [    3.452889] CPU    5: hi:    0,
> btch:   1 usd:   0
> Feb 11 17:22:10 minijep kernel: [    3.452890] CPU    6: hi:    0,
> btch:   1 usd:   0
> Feb 11 17:22:10 minijep kernel: [    3.452892] CPU    7: hi:    0,
> btch:   1 usd:   0
>
>
> --
> --
> Salutacions...Josep
> --
>
>
> --
> --
> Salutacions...Josep
> --



-- 
--
Salutacions...Josep
--
--
To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux ARM Kernel]     [Linux Filesystem Development]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux