Re: cephfs kernel bug (4.9.44)?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sorry for lack of detail, here is some more info:

Currently using ceph 10.2.7

- Prior to the error there was nothing in the kernel log for several hours.
- cephfs snapshots are enabled, but are not currently being taken at
regular intervals, the last one was 2 days before the error message
appeared.
- cephfs has a data pool with 17123471 objects (34% full) and a
metadata pool with 70K objects
- the system has 85 OSDs and 3 MDS servers, all are in a healthy state.
- We use 3-copy replication rules:  81739 GB used, 161 TB / 241 TB avail


-Wyllys


On Thu, Aug 31, 2017 at 9:59 AM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
> On Thu, Aug 31, 2017 at 3:09 PM, Wyllys Ingersoll
> <wyllys.ingersoll@xxxxxxxxxxxxxx> wrote:
>> Discovered this message in my kernel logs today, running 4.9.44 kernel
>> with a kernel cephfs mount:
>>
>>
>> [Wed Aug 30 14:17:04 2017] kernel BUG at
>> /home/kernel/COD/linux/net/ceph/osd_client.c:1554!
>> [Wed Aug 30 14:17:04 2017] invalid opcode: 0000 [#1] SMP
>> [Wed Aug 30 14:17:04 2017] Modules linked in: binfmt_misc ipmi_devintf
>> ceph libceph libcrc32c fscache ipmi_ssif intel_powerclamp coretemp
>> kvm_intel kvm gpio_ich input_leds ipmi_si serio_raw irqbypass
>> intel_cstate shpchp i7core_edac hpilo lpc_ich edac_core
>> acpi_power_meter ipmi_msghandler mac_hid 8021q garp mrp stp llc
>> bonding nfsd auth_rpcgss nfs_acl lp lockd grace parport sunrpc autofs4
>> btrfs xor raid6_pq mlx4_en ptp pps_core hid_generic i2c_algo_bit ttm
>> drm_kms_helper usbhid syscopyarea sysfillrect sysimgblt fb_sys_fops
>> hid mlx4_core hpsa psmouse drm pata_acpi bnx2 devlink
>> scsi_transport_sas fjes
>> [Wed Aug 30 14:17:04 2017] CPU: 18 PID: 471071 Comm: vsftpd Tainted: G
>>          I     4.9.44-040944-generic #201708161731
>> [Wed Aug 30 14:17:04 2017] Hardware name: HP ProLiant DL360 G6, BIOS
>> P64 08/16/2015
>> [Wed Aug 30 14:17:04 2017] task: ffff9268331bc080 task.stack: ffffab4a96988000
>> [Wed Aug 30 14:17:04 2017] RIP: 0010:[<ffffffffc09c44f7>]
>> [<ffffffffc09c44f7>] send_request+0xa27/0xab0 [libceph]
>> [Wed Aug 30 14:17:04 2017] RSP: 0018:ffffab4a9698b8e8  EFLAGS: 00010293
>> [Wed Aug 30 14:17:04 2017] RAX: 0000000000000000 RBX: 0000000000002201
>> RCX: ffff925e48490000
>> [Wed Aug 30 14:17:04 2017] RDX: ffff926242fcf553 RSI: 0000000000001295
>> RDI: 0000000000002201
>> [Wed Aug 30 14:17:04 2017] RBP: ffffab4a9698b958 R08: ffff92685f95c9e0
>> R09: 0000000000000000
>> [Wed Aug 30 14:17:04 2017] R10: 0000000000000000 R11: ffff926842265680
>> R12: ffff92684078c610
>> [Wed Aug 30 14:17:04 2017] R13: 0000000000000001 R14: ffff926242fc608b
>> R15: ffff92684078c610
>> [Wed Aug 30 14:17:04 2017] FS:  00007f5b59fd5700(0000)
>> GS:ffff92685f940000(0000) knlGS:0000000000000000
>> [Wed Aug 30 14:17:04 2017] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [Wed Aug 30 14:17:04 2017] CR2: 00007fec2bb2f003 CR3: 000000068ef59000
>> CR4: 00000000000006e0
>> [Wed Aug 30 14:17:04 2017] Stack:
>> [Wed Aug 30 14:17:04 2017]  01ffffffc09b9cd7 ffff92685f959300
>> 0000000000000067 ffff9268331bc080
>> [Wed Aug 30 14:17:04 2017]  ffff926242fc7000 ffff926241a3e000
>> ffff926840215200 00000e8100002201
>> [Wed Aug 30 14:17:04 2017]  0000000000000000 ffff92684078c610
>> ffff926841ddf7c0 0000000000000000
>> [Wed Aug 30 14:17:04 2017] Call Trace:
>> [Wed Aug 30 14:17:04 2017]  [<ffffffffc09c815a>]
>> __submit_request+0x20a/0x2f0 [libceph]
>> [Wed Aug 30 14:17:04 2017]  [<ffffffffc09c826b>]
>> submit_request+0x2b/0x30 [libceph]
>> [Wed Aug 30 14:17:05 2017]  [<ffffffffc09c8c14>]
>> ceph_osdc_writepages+0x104/0x1a0 [libceph]
>> [Wed Aug 30 14:17:05 2017]  [<ffffffffc0a0f4b1>]
>> writepage_nounlock+0x2c1/0x470 [ceph]
>> [Wed Aug 30 14:17:05 2017]  [<ffffffffa65f120a>] ? page_mkclean+0x6a/0xb0
>> [Wed Aug 30 14:17:05 2017]  [<ffffffffa65ef3b0>] ?
>> __page_check_address+0x1c0/0x1c0
>> [Wed Aug 30 14:17:05 2017]  [<ffffffffc0a11f9c>]
>> ceph_update_writeable_page+0xdc/0x4a0 [ceph]
>> [Wed Aug 30 14:17:05 2017]  [<ffffffffa65a974d>] ?
>> pagecache_get_page+0x17d/0x2a0
>> [Wed Aug 30 14:17:05 2017]  [<ffffffffc0a123ca>]
>> ceph_write_begin+0x6a/0x120 [ceph]
>> [Wed Aug 30 14:17:05 2017]  [<ffffffffa65a89b8>]
>> generic_perform_write+0xc8/0x1c0
>> [Wed Aug 30 14:17:05 2017]  [<ffffffffa66592ee>] ? file_update_time+0x5e/0x110
>> [Wed Aug 30 14:17:05 2017]  [<ffffffffc0a0c402>]
>> ceph_write_iter+0xba2/0xbe0 [ceph]
>> [Wed Aug 30 14:17:05 2017]  [<ffffffffa6b6238c>] ? release_sock+0x8c/0xa0
>> [Wed Aug 30 14:17:05 2017]  [<ffffffffa6bce0b9>] ? tcp_recvmsg+0x4c9/0xb50
>> [Wed Aug 30 14:17:05 2017]  [<ffffffffa6b5d65d>] ? sock_recvmsg+0x3d/0x50
>> [Wed Aug 30 14:17:05 2017]  [<ffffffffa663ad45>] __vfs_write+0xe5/0x160
>> [Wed Aug 30 14:17:05 2017]  [<ffffffffa663bfe5>] vfs_write+0xb5/0x1a0
>> [Wed Aug 30 14:17:05 2017]  [<ffffffffa663d465>] SyS_write+0x55/0xc0
>> [Wed Aug 30 14:17:05 2017]  [<ffffffffa6c9b9bb>]
>> entry_SYSCALL_64_fastpath+0x1e/0xad
>> [Wed Aug 30 14:17:05 2017] Code: fb ab e5 e9 de f6 ff ff ba 14 00 00
>> 00 e9 42 f7 ff ff 49 c7 46 08 00 00 00 00 41 c7 46 10 00 00 00 00 49
>> 8d 56 14 e9 6d fb ff ff <0f> 0b 0f 0b be 8f 05 00 00 48 c7 c7 d8 0c 9e
>> c0 e8 b4 fb ab e5
>> [Wed Aug 30 14:17:05 2017] RIP  [<ffffffffc09c44f7>]
>> send_request+0xa27/0xab0 [libceph]
>> [Wed Aug 30 14:17:05 2017]  RSP <ffffab4a9698b8e8>
>> [Wed Aug 30 14:17:05 2017] ---[ end trace 5c55854998e663dc ]---
>
> Hi Wyllys,
>
> Yes, looks like MOSDOp size was miscalculated.
>
> Could you give some context?  Anything before this splat in the kernel
> log, ceph version, cephfs configuration -- pools, namespaces, snapshots,
> fscache, etc.
>
> Thanks,
>
>                 Ilya
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux