Re: bcache bug / fs freeze on heavy IO

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Thomas,

On Mon, Aug 25, 2014 at 11:38 PM, Thomas Klaube <thomas@xxxxxxxxxx> wrote:
> ----- Ursprüngliche Mail -----
>> Von: "Kent Overstreet" <kmo@xxxxxxxxxxxxx>
>> An: "Thomas Klaube" <thomas@xxxxxxxxxx>
>> CC: linux-bcache@xxxxxxxxxxxxxxx
>> Gesendet: Freitag, 22. August 2014 11:38:05
>> Betreff: Re: bcache bug / fs freeze on heavy IO
>>
>> there weren't any bcache changes in 3.16 from 3.15, so unless you hit
>> this again or someone else reports it I would think you just got
>> unlucky.
>
> Hi,
>
> I have similar issue again. This is with kernel 3.13.0-34 (ubuntu
> server 14.04.1 LTS). This also happend during a fio benchmark on a
> bcache device:
>
> Aug 26 01:52:06 ubuntu kernel: [18378.656038] BUG: unable to handle kernel NULL pointer dereference at 0000000000000099

I believe this is fixed in 3.17:

http://evilpiepirate.org/git/linux-bcache.git/commit/?h=bcache-dev&id=2452cc89063a2a6890368f185c4b6d7d8802179e

Can you either try upgrading your kernel, or as a workaround try
increasing your bucket size. This will lower the btree depth (depth 2
trees won't hit this problem).

> Aug 26 01:52:06 ubuntu kernel: [18378.656067] IP: [<ffffffffa0306bb6>] bch_btree_insert_node+0x16/0x2b0 [bcache]
> Aug 26 01:52:06 ubuntu kernel: [18378.656093] PGD 0
> Aug 26 01:52:06 ubuntu kernel: [18378.656101] Oops: 0000 [#1] SMP
> Aug 26 01:52:06 ubuntu kernel: [18378.656113] Modules linked in: bcache binfmt_misc x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul ast ttm crc32_pclmul ghash_clmulni_intel drm_kms_helper aesni_intel aes_x86_64 drm lrw gf128mul glue_helper ablk_helper syscopyarea cryptd sysfillrect sysimgblt lpc_ich shpchp mei_me mei bonding lp parport ipmi_si video mac_hid acpi_pad hid_generic usbhid ses hid enclosure usb_storage megaraid_sas ahci libahci igb e1000e i2c_algo_bit dca ptp pps_core
> Aug 26 01:52:06 ubuntu kernel: [18378.656277] CPU: 3 PID: 1770 Comm: bcache_gc Not tainted 3.13.0-34-generic #60-Ubuntu
> Aug 26 01:52:06 ubuntu kernel: [18378.656299] Hardware name: Supermicro X10SLM-F/X10SLM-F, BIOS 2.0 04/24/2014
> Aug 26 01:52:06 ubuntu kernel: [18378.656319] task: ffff8804045fc7d0 ti: ffff880405b28000 task.ti: ffff880405b28000
> Aug 26 01:52:06 ubuntu kernel: [18378.656340] RIP: 0010:[<ffffffffa0306bb6>]  [<ffffffffa0306bb6>] bch_btree_insert_node+0x16/0x2b0 [bcache]
> Aug 26 01:52:06 ubuntu kernel: [18378.656370] RSP: 0018:ffff880405b297d8  EFLAGS: 00010246
> Aug 26 01:52:06 ubuntu kernel: [18378.656385] RAX: ffff8803fe5c0000 RBX: ffff8802f5824400 RCX: 0000000000000000
> Aug 26 01:52:06 ubuntu kernel: [18378.656405] RDX: ffff880405b29858 RSI: ffff880405b29dd4 RDI: ffffffffffffffff
> Aug 26 01:52:06 ubuntu kernel: [18378.656424] RBP: ffff880405b297f8 R08: 0000000000000000 R09: ffff880405b29880
> Aug 26 01:52:06 ubuntu kernel: [18378.656444] R10: 0000000000000001 R11: 000007ffffffffff R12: 0000000000000000
> Aug 26 01:52:06 ubuntu kernel: [18378.656464] R13: ffff880405b29858 R14: ffff880405b29828 R15: 0000000000004587
> Aug 26 01:52:06 ubuntu kernel: [18378.656484] FS:  0000000000000000(0000) GS:ffff88041fd80000(0000) knlGS:0000000000000000
> Aug 26 01:52:06 ubuntu kernel: [18378.656507] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Aug 26 01:52:06 ubuntu kernel: [18378.656524] CR2: 0000000000000099 CR3: 0000000001c0e000 CR4: 00000000001407e0
> Aug 26 01:52:06 ubuntu kernel: [18378.656544] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Aug 26 01:52:06 ubuntu kernel: [18378.656564] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Aug 26 01:52:06 ubuntu kernel: [18378.656584] Stack:
> Aug 26 01:52:06 ubuntu kernel: [18378.656590]  ffff8802f5824400 ffff880039161800 0000000000000000 ffff880405b29828
> Aug 26 01:52:06 ubuntu kernel: [18378.656614]  ffff880405b29910 ffffffffa0306a71 0000000000000000 ffff880405b29ab0
> Aug 26 01:52:06 ubuntu kernel: [18378.656638]  000010b71d30b6be ffff880405b29dd4 0000000000000000 ffff8804045fc7d0
> Aug 26 01:52:06 ubuntu kernel: [18378.656661] Call Trace:
> Aug 26 01:52:06 ubuntu kernel: [18378.656672]  [<ffffffffa0306a71>] btree_split+0x441/0x570 [bcache]
> Aug 26 01:52:06 ubuntu kernel: [18378.656692]  [<ffffffff810753d5>] ? del_timer+0x55/0x70
> Aug 26 01:52:06 ubuntu kernel: [18378.656709]  [<ffffffff81081f89>] ? try_to_grab_pending+0xa9/0x160
> Aug 26 01:52:06 ubuntu kernel: [18378.656728]  [<ffffffffa0306cc1>] bch_btree_insert_node+0x121/0x2b0 [bcache]
> Aug 26 01:52:06 ubuntu kernel: [18378.656750]  [<ffffffffa030787e>] btree_gc_recurse+0xa2e/0xbb0 [bcache]
> Aug 26 01:52:06 ubuntu kernel: [18378.656771]  [<ffffffffa0309755>] ? bch_btree_ptr_invalid+0xa5/0xd0 [bcache]
> Aug 26 01:52:06 ubuntu kernel: [18378.656793]  [<ffffffffa03072d6>] btree_gc_recurse+0x486/0xbb0 [bcache]
> Aug 26 01:52:06 ubuntu kernel: [18378.656813]  [<ffffffff810a7145>] ? load_balance+0x185/0x890
> Aug 26 01:52:06 ubuntu kernel: [18378.656831]  [<ffffffffa0309755>] ? bch_btree_ptr_invalid+0xa5/0xd0 [bcache]
> Aug 26 01:52:06 ubuntu kernel: [18378.656852]  [<ffffffff8101b7e9>] ? sched_clock+0x9/0x10
> Aug 26 01:52:06 ubuntu kernel: [18378.656869]  [<ffffffffa0302380>] ? btree_node_free+0x1d0/0x1d0 [bcache]
> Aug 26 01:52:06 ubuntu kernel: [18378.656889]  [<ffffffffa0305803>] ? btree_gc_mark_node+0x63/0x210 [bcache]
> Aug 26 01:52:06 ubuntu kernel: [18378.656910]  [<ffffffffa0307feb>] bch_btree_gc+0x41b/0x5a0 [bcache]
> Aug 26 01:52:06 ubuntu kernel: [18378.656930]  [<ffffffff8171fd41>] ? __schedule+0x381/0x7d0
> Aug 26 01:52:06 ubuntu kernel: [18378.656948]  [<ffffffffa03081a8>] bch_gc_thread+0x38/0x120 [bcache]
> Aug 26 01:52:06 ubuntu kernel: [18378.656967]  [<ffffffffa0308170>] ? bch_btree_gc+0x5a0/0x5a0 [bcache]
> Aug 26 01:52:06 ubuntu kernel: [18378.656986]  [<ffffffff8108b3d2>] kthread+0xd2/0xf0
> Aug 26 01:52:06 ubuntu kernel: [18378.657608]  [<ffffffff8108b300>] ? kthread_create_on_node+0x1d0/0x1d0
> Aug 26 01:52:06 ubuntu kernel: [18378.658237]  [<ffffffff8172c6bc>] ret_from_fork+0x7c/0xb0
> Aug 26 01:52:06 ubuntu kernel: [18378.658845]  [<ffffffff8108b300>] ? kthread_create_on_node+0x1d0/0x1d0
> Aug 26 01:52:06 ubuntu kernel: [18378.659445] Code: 24 60 e8 5e a1 da e0 eb 8a 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 49 89 d5 41 54 49 89 cc 53 <80> bf 9a 00 00 00 00 48 89 fb 0f 85 6c 02 00 00 4c 8b 8b 80 00
> Aug 26 01:52:06 ubuntu kernel: [18378.660709] RIP  [<ffffffffa0306bb6>] bch_btree_insert_node+0x16/0x2b0 [bcache]
> Aug 26 01:52:06 ubuntu kernel: [18378.661333]  RSP <ffff880405b297d8>
> Aug 26 01:52:06 ubuntu kernel: [18378.661938] CR2: 0000000000000099
> Aug 26 01:52:06 ubuntu kernel: [18378.685807] ---[ end trace c759c6ac8f543aa1 ]---
>
> There are several fio processes hanging in d state and kill -9 does
> not work. Elevator is cfq, here is the fio setup:
>
> [rnd]
> rw=randrw
> ramp_time=30
> runtime=36600
> time_based
> rwmixread=30
> size=100g
> refill_buffers=1
> directory=.
> iodepth=64
> direct=1
> blocksize=4k
> numjobs=16
> group_reporting
> ioengine=libaio
> loops=1
>
> the fio job reads/writes to preallocated files and this fio job is
> run in parallel with a similar fio job (same setup) on a non-bcached
> device. There is no error on the fio job that runs on the non-bcache
> device (job is successfully finishing after 36600 sec with reasonable
> results). There are no errors in the controller logs and there are no
> other errors in dmesg.
>
> Any ideas? Probably I can reproduce this.
>
> Regards
> Thomas Klaube
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux ARM Kernel]     [Linux Filesystem Development]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux