Hi Thomas, On Mon, Aug 25, 2014 at 11:38 PM, Thomas Klaube <thomas@xxxxxxxxxx> wrote: > ----- Ursprüngliche Mail ----- >> Von: "Kent Overstreet" <kmo@xxxxxxxxxxxxx> >> An: "Thomas Klaube" <thomas@xxxxxxxxxx> >> CC: linux-bcache@xxxxxxxxxxxxxxx >> Gesendet: Freitag, 22. August 2014 11:38:05 >> Betreff: Re: bcache bug / fs freeze on heavy IO >> >> there weren't any bcache changes in 3.16 from 3.15, so unless you hit >> this again or someone else reports it I would think you just got >> unlucky. > > Hi, > > I have similar issue again. This is with kernel 3.13.0-34 (ubuntu > server 14.04.1 LTS). This also happend during a fio benchmark on a > bcache device: > > Aug 26 01:52:06 ubuntu kernel: [18378.656038] BUG: unable to handle kernel NULL pointer dereference at 0000000000000099 I believe this is fixed in 3.17: http://evilpiepirate.org/git/linux-bcache.git/commit/?h=bcache-dev&id=2452cc89063a2a6890368f185c4b6d7d8802179e Can you either try upgrading your kernel, or as a workaround try increasing your bucket size. This will lower the btree depth (depth 2 trees won't hit this problem). > Aug 26 01:52:06 ubuntu kernel: [18378.656067] IP: [<ffffffffa0306bb6>] bch_btree_insert_node+0x16/0x2b0 [bcache] > Aug 26 01:52:06 ubuntu kernel: [18378.656093] PGD 0 > Aug 26 01:52:06 ubuntu kernel: [18378.656101] Oops: 0000 [#1] SMP > Aug 26 01:52:06 ubuntu kernel: [18378.656113] Modules linked in: bcache binfmt_misc x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul ast ttm crc32_pclmul ghash_clmulni_intel drm_kms_helper aesni_intel aes_x86_64 drm lrw gf128mul glue_helper ablk_helper syscopyarea cryptd sysfillrect sysimgblt lpc_ich shpchp mei_me mei bonding lp parport ipmi_si video mac_hid acpi_pad hid_generic usbhid ses hid enclosure usb_storage megaraid_sas ahci libahci igb e1000e i2c_algo_bit dca ptp pps_core > Aug 26 01:52:06 ubuntu kernel: [18378.656277] CPU: 3 PID: 1770 Comm: bcache_gc Not tainted 3.13.0-34-generic #60-Ubuntu > Aug 26 01:52:06 ubuntu kernel: [18378.656299] Hardware name: Supermicro X10SLM-F/X10SLM-F, BIOS 2.0 04/24/2014 > Aug 26 01:52:06 ubuntu kernel: [18378.656319] task: ffff8804045fc7d0 ti: ffff880405b28000 task.ti: ffff880405b28000 > Aug 26 01:52:06 ubuntu kernel: [18378.656340] RIP: 0010:[<ffffffffa0306bb6>] [<ffffffffa0306bb6>] bch_btree_insert_node+0x16/0x2b0 [bcache] > Aug 26 01:52:06 ubuntu kernel: [18378.656370] RSP: 0018:ffff880405b297d8 EFLAGS: 00010246 > Aug 26 01:52:06 ubuntu kernel: [18378.656385] RAX: ffff8803fe5c0000 RBX: ffff8802f5824400 RCX: 0000000000000000 > Aug 26 01:52:06 ubuntu kernel: [18378.656405] RDX: ffff880405b29858 RSI: ffff880405b29dd4 RDI: ffffffffffffffff > Aug 26 01:52:06 ubuntu kernel: [18378.656424] RBP: ffff880405b297f8 R08: 0000000000000000 R09: ffff880405b29880 > Aug 26 01:52:06 ubuntu kernel: [18378.656444] R10: 0000000000000001 R11: 000007ffffffffff R12: 0000000000000000 > Aug 26 01:52:06 ubuntu kernel: [18378.656464] R13: ffff880405b29858 R14: ffff880405b29828 R15: 0000000000004587 > Aug 26 01:52:06 ubuntu kernel: [18378.656484] FS: 0000000000000000(0000) GS:ffff88041fd80000(0000) knlGS:0000000000000000 > Aug 26 01:52:06 ubuntu kernel: [18378.656507] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > Aug 26 01:52:06 ubuntu kernel: [18378.656524] CR2: 0000000000000099 CR3: 0000000001c0e000 CR4: 00000000001407e0 > Aug 26 01:52:06 ubuntu kernel: [18378.656544] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > Aug 26 01:52:06 ubuntu kernel: [18378.656564] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Aug 26 01:52:06 ubuntu kernel: [18378.656584] Stack: > Aug 26 01:52:06 ubuntu kernel: [18378.656590] ffff8802f5824400 ffff880039161800 0000000000000000 ffff880405b29828 > Aug 26 01:52:06 ubuntu kernel: [18378.656614] ffff880405b29910 ffffffffa0306a71 0000000000000000 ffff880405b29ab0 > Aug 26 01:52:06 ubuntu kernel: [18378.656638] 000010b71d30b6be ffff880405b29dd4 0000000000000000 ffff8804045fc7d0 > Aug 26 01:52:06 ubuntu kernel: [18378.656661] Call Trace: > Aug 26 01:52:06 ubuntu kernel: [18378.656672] [<ffffffffa0306a71>] btree_split+0x441/0x570 [bcache] > Aug 26 01:52:06 ubuntu kernel: [18378.656692] [<ffffffff810753d5>] ? del_timer+0x55/0x70 > Aug 26 01:52:06 ubuntu kernel: [18378.656709] [<ffffffff81081f89>] ? try_to_grab_pending+0xa9/0x160 > Aug 26 01:52:06 ubuntu kernel: [18378.656728] [<ffffffffa0306cc1>] bch_btree_insert_node+0x121/0x2b0 [bcache] > Aug 26 01:52:06 ubuntu kernel: [18378.656750] [<ffffffffa030787e>] btree_gc_recurse+0xa2e/0xbb0 [bcache] > Aug 26 01:52:06 ubuntu kernel: [18378.656771] [<ffffffffa0309755>] ? bch_btree_ptr_invalid+0xa5/0xd0 [bcache] > Aug 26 01:52:06 ubuntu kernel: [18378.656793] [<ffffffffa03072d6>] btree_gc_recurse+0x486/0xbb0 [bcache] > Aug 26 01:52:06 ubuntu kernel: [18378.656813] [<ffffffff810a7145>] ? load_balance+0x185/0x890 > Aug 26 01:52:06 ubuntu kernel: [18378.656831] [<ffffffffa0309755>] ? bch_btree_ptr_invalid+0xa5/0xd0 [bcache] > Aug 26 01:52:06 ubuntu kernel: [18378.656852] [<ffffffff8101b7e9>] ? sched_clock+0x9/0x10 > Aug 26 01:52:06 ubuntu kernel: [18378.656869] [<ffffffffa0302380>] ? btree_node_free+0x1d0/0x1d0 [bcache] > Aug 26 01:52:06 ubuntu kernel: [18378.656889] [<ffffffffa0305803>] ? btree_gc_mark_node+0x63/0x210 [bcache] > Aug 26 01:52:06 ubuntu kernel: [18378.656910] [<ffffffffa0307feb>] bch_btree_gc+0x41b/0x5a0 [bcache] > Aug 26 01:52:06 ubuntu kernel: [18378.656930] [<ffffffff8171fd41>] ? __schedule+0x381/0x7d0 > Aug 26 01:52:06 ubuntu kernel: [18378.656948] [<ffffffffa03081a8>] bch_gc_thread+0x38/0x120 [bcache] > Aug 26 01:52:06 ubuntu kernel: [18378.656967] [<ffffffffa0308170>] ? bch_btree_gc+0x5a0/0x5a0 [bcache] > Aug 26 01:52:06 ubuntu kernel: [18378.656986] [<ffffffff8108b3d2>] kthread+0xd2/0xf0 > Aug 26 01:52:06 ubuntu kernel: [18378.657608] [<ffffffff8108b300>] ? kthread_create_on_node+0x1d0/0x1d0 > Aug 26 01:52:06 ubuntu kernel: [18378.658237] [<ffffffff8172c6bc>] ret_from_fork+0x7c/0xb0 > Aug 26 01:52:06 ubuntu kernel: [18378.658845] [<ffffffff8108b300>] ? kthread_create_on_node+0x1d0/0x1d0 > Aug 26 01:52:06 ubuntu kernel: [18378.659445] Code: 24 60 e8 5e a1 da e0 eb 8a 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 49 89 d5 41 54 49 89 cc 53 <80> bf 9a 00 00 00 00 48 89 fb 0f 85 6c 02 00 00 4c 8b 8b 80 00 > Aug 26 01:52:06 ubuntu kernel: [18378.660709] RIP [<ffffffffa0306bb6>] bch_btree_insert_node+0x16/0x2b0 [bcache] > Aug 26 01:52:06 ubuntu kernel: [18378.661333] RSP <ffff880405b297d8> > Aug 26 01:52:06 ubuntu kernel: [18378.661938] CR2: 0000000000000099 > Aug 26 01:52:06 ubuntu kernel: [18378.685807] ---[ end trace c759c6ac8f543aa1 ]--- > > There are several fio processes hanging in d state and kill -9 does > not work. Elevator is cfq, here is the fio setup: > > [rnd] > rw=randrw > ramp_time=30 > runtime=36600 > time_based > rwmixread=30 > size=100g > refill_buffers=1 > directory=. > iodepth=64 > direct=1 > blocksize=4k > numjobs=16 > group_reporting > ioengine=libaio > loops=1 > > the fio job reads/writes to preallocated files and this fio job is > run in parallel with a similar fio job (same setup) on a non-bcached > device. There is no error on the fio job that runs on the non-bcache > device (job is successfully finishing after 36600 sec with reasonable > results). There are no errors in the controller logs and there are no > other errors in dmesg. > > Any ideas? Probably I can reproduce this. > > Regards > Thomas Klaube > -- > To unsubscribe from this list: send the line "unsubscribe linux-bcache" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-bcache" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html