bcache: discard BUG during lvremove

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 22 Sep 2016, Matthias Ferdinand wrote:
> On Wed, Sep 21, 2016 at 02:08:06PM -0700, Eric Wheeler wrote:
> > Looks like a deadlock, maybe?  Kent is good at troubleshooting those.
> 
> I guess so.
> 
> >     [ 1930.459062] kernel BUG at block/bio.c:1789! Matthias, which BUG_ON 
> > is this below in your code?  My codepaste is below if v4.7 from LXR, I 
> > don't have 4.8 handy.
> > 
> >     [ 1930.459648] invalid opcode: 0000 [#1] SMP
> >     [ 1930.520004] CPU: 0 PID: 12673 Comm: lvremove Not tainted 4.8.0-rc5 #2
> >     [ 1930.545645] Hardware name: HP ProLiant MicroServer Gen8, BIOS J06 07/16/2015
> >     [ 1931.077608]  [<ffffffff96393aad>] blk_queue_split+0x47d/0x640
> >     [ 1931.101157]  [<ffffffff9638f3a4>] blk_queue_bio+0x44/0x390
> >     [ 1931.124083]  [<ffffffff9638d8c4>] generic_make_request+0x104/0x1b0
> >     [ 1931.146371]  [<ffffffff9638d9dd>] submit_bio+0x6d/0x150
> >     [ 1931.168393]  [<ffffffff96385649>] ? bio_alloc_bioset+0x169/0x2b0
> >     [ 1931.189853]  [<ffffffff96395e68>] next_bio+0x38/0x40
> >     [ 1931.210743]  [<ffffffff96395f93>] __blkdev_issue_discard+0x123/0x1c0
> >     [ 1931.231522]  [<ffffffff963961c2>] blkdev_issue_discard+0x52/0xa0
> >     [ 1931.251942]  [<ffffffff9639c360>] blk_ioctl_discard+0x80/0xa0
> >     [ 1931.272067]  [<ffffffff9639cfb6>] blkdev_ioctl+0x716/0x8c0
> >     [ 1931.291454]  [<ffffffff9621db04>] ? mntput+0x24/0x40
> >     [ 1931.310551]  [<ffffffff96237231>] block_ioctl+0x41/0x50
> >     [ 1931.329247]  [<ffffffff96210676>] do_vfs_ioctl+0x96/0x5a0
> >     [ 1931.347634]  [<ffffffff961bb7d8>] ? do_munmap+0x298/0x390
> >     [ 1931.366132]  [<ffffffff96210bf9>] SyS_ioctl+0x79/0x90
> >     [ 1931.384667]  [<ffffffff967b49b6>] entry_SYSCALL_64_fastpath+0x1e/0xa8
> 
> 
> For some reason, it pointed to the comment immediately after the
> BUG_ONs. I now recompiled and re-tested with some added printk, and it
> shows it throws a BUG because sectors==0:

Lets try to rule out bcache since I don't see it in the lvremove BUG 
trace: detach your cache device to remove the writeback thread activity 
and try again.  Does it still bomb?

--
Eric Wheeler

> 
>    1780 struct bio *bio_split(struct bio *bio, int sectors,
>    1781                       gfp_t gfp, struct bio_set *bs)
>    1782 {
>    1783         struct bio *split = NULL;
>    1784 
>    1785         if (sectors <= 0) {
>    1786             printk(KERN_ERR " bio_split: sectors <= 0: %d\n", sectors);
>    1787         }
> => 1788         BUG_ON(sectors <= 0);
>    1789         if (sectors >= bio_sectors(bio)) {
>    1790             printk(KERN_ERR " bio_split: sectors >= bio_sectors(bio): %d\n", bio_sectors(bio));
>    1791         }
>    1792         BUG_ON(sectors >= bio_sectors(bio));
>    1793 
> 
> =>  [  581.921539]  bio_split: sectors <= 0: 0
>     [  581.922118] ------------[ cut here ]------------
> =>  [  581.922778] kernel BUG at block/bio.c:1788!
>     [  581.923375] invalid opcode: 0000 [#1] SMP
>     [  581.923947] Modules linked in: dm_snapshot dm_bufio bcache ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_CHECKSUM iptable_mangle xt_tcpudp bridge stp llc ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp gpio_ich kvm_intel ipmi_ssif kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel cryptd dm_multipath serio_raw ipmi_si input_leds ipmi_msghandler lp acpi_power_meter hpilo lpc_ich ie31200_edac parport edac_core btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx hid_generic uas xor tg3 usbhid psmouse usb_storage raid6_pq hid libcrc32c ahci raid1 libahci raid0 ptp multipath pps_core linear
>     [  581.936332] CPU: 0 PID: 8869 Comm: lvremove Not tainted 4.8.0-rc5 #2
>     [  581.937238] Hardware name: HP ProLiant MicroServer Gen8, BIOS J06 07/16/2015
>     [  581.938244] task: ffffa248af6fd780 task.stack: ffffa248ab3a0000
>     [  581.939087] RIP: 0010:[<ffffffffa9386dc3>]  [<ffffffffa9386dc3>] bio_split+0xd3/0xe0
>     [  581.963285] RSP: 0018:ffffa248ab3a3b78  EFLAGS: 00010286
>     [  581.987429] RAX: 000000000000001b RBX: 0000000000002000 RCX: 0000000000000000
>     [  582.012220] RDX: 0000000000000001 RSI: ffffa248cac0dc68 RDI: ffffa248cac0dc68
>     [  582.036724] RBP: ffffa248ab3a3b98 R08: 0000000000000388 R09: ffffba6780277620
>     [  582.061007] R10: 0000000000000001 R11: 0000000000cdcdcd R12: 0000000000000000
>     [  582.084953] R13: 0000000000000000 R14: 0000000000000000 R15: 00000000000000a8
>     [  582.108219] FS:  00007f9fbc0c1840(0000) GS:ffffa248cac00000(0000) knlGS:0000000000000000
>     [  582.153456] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>     [  582.176597] CR2: 00007f5c43346000 CR3: 00000003ed67a000 CR4: 00000000001406f0
>     [  582.200295] Stack:
>     [  582.223609]  ffffa248ab3a3b88 0000000000002000 0000000000000000 0000000000000000
>     [  582.272070]  ffffa248ab3a3c38 ffffffffa9393a1d ffffa248ae66e2c0 ffffa248ab3a3bf8
>     [  582.321312]  ffffa248c955ec80 0000000000000000 ffffa248c9a8f360 ffffa248ab3a3c48
>     [  582.370370] Call Trace:
>     [  582.394036]  [<ffffffffa9393a1d>] blk_queue_split+0x45d/0x620
>     [  582.417899]  [<ffffffffa938f334>] blk_queue_bio+0x44/0x390
>     [  582.441143]  [<ffffffffa938d841>] generic_make_request+0xe1/0x1a0
>     [  582.463975]  [<ffffffffa938d96d>] submit_bio+0x6d/0x150
>     [  582.486262]  [<ffffffffa93855a8>] ? bio_alloc_bioset+0x168/0x2a0
>     [  582.508189]  [<ffffffffa9395dd8>] next_bio+0x38/0x40
>     [  582.529570]  [<ffffffffa9395f03>] __blkdev_issue_discard+0x123/0x1c0
>     [  582.550840]  [<ffffffffa9396132>] blkdev_issue_discard+0x52/0xa0
>     [  582.571751]  [<ffffffffa939c2d0>] blk_ioctl_discard+0x80/0xa0
>     [  582.592265]  [<ffffffffa939cf26>] blkdev_ioctl+0x716/0x8c0
>     [  582.612358]  [<ffffffffa921db04>] ? mntput+0x24/0x40
>     [  582.631881]  [<ffffffffa9237231>] block_ioctl+0x41/0x50
>     [  582.651145]  [<ffffffffa9210676>] do_vfs_ioctl+0x96/0x5a0
>     [  582.670099]  [<ffffffffa91bb7d8>] ? do_munmap+0x298/0x390
>     [  582.688879]  [<ffffffffa9210bf9>] SyS_ioctl+0x79/0x90
>     [  582.707361]  [<ffffffffa97b49b6>] entry_SYSCALL_64_fastpath+0x1e/0xa8
>     [  582.726040] Code: e0 e8 3b 6a df ff 41 8b 44 24 28 48 8b 4d e0 c1 e8 09 41 39 c5 0f 82 6a ff ff ff 0f 0b 48 c7 c7 e8 43 ab a9 31 c0 e8 16 6a df ff <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 48 8b 07 
>     [  582.785705] RIP  [<ffffffffa9386dc3>] bio_split+0xd3/0xe0
>     [  582.804880]  RSP <ffffa248ab3a3b78>
>     [  582.823680] ---[ end trace 16c2130868fdd195 ]---
>     
>     
> Not having found anything suspicious in
> v2-1-1-block-fix-blk_queue_split-resource-exhaustion.patch, I tried
> lvremove (with discard) of that (rc5-patched-created) structure under
> unpatched 4.8.0-rc6, and it threw the same BUG_ON.
> 
> I had not reached that point in my test scenario with unpatched 4.8
> before, since the deadlock always hit first.
> 
> So it seems to actually be a bug in the kernel, not in LGEs patch.
> 
> 
> Will try to find simple steps to reproduce tomorrow.
> 
> 
> Matthias
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux ARM Kernel]     [Linux Filesystem Development]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux