Sorry for lack of detail, here is some more info: Currently using ceph 10.2.7 - Prior to the error there was nothing in the kernel log for several hours. - cephfs snapshots are enabled, but are not currently being taken at regular intervals, the last one was 2 days before the error message appeared. - cephfs has a data pool with 17123471 objects (34% full) and a metadata pool with 70K objects - the system has 85 OSDs and 3 MDS servers, all are in a healthy state. - We use 3-copy replication rules: 81739 GB used, 161 TB / 241 TB avail -Wyllys On Thu, Aug 31, 2017 at 9:59 AM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote: > On Thu, Aug 31, 2017 at 3:09 PM, Wyllys Ingersoll > <wyllys.ingersoll@xxxxxxxxxxxxxx> wrote: >> Discovered this message in my kernel logs today, running 4.9.44 kernel >> with a kernel cephfs mount: >> >> >> [Wed Aug 30 14:17:04 2017] kernel BUG at >> /home/kernel/COD/linux/net/ceph/osd_client.c:1554! >> [Wed Aug 30 14:17:04 2017] invalid opcode: 0000 [#1] SMP >> [Wed Aug 30 14:17:04 2017] Modules linked in: binfmt_misc ipmi_devintf >> ceph libceph libcrc32c fscache ipmi_ssif intel_powerclamp coretemp >> kvm_intel kvm gpio_ich input_leds ipmi_si serio_raw irqbypass >> intel_cstate shpchp i7core_edac hpilo lpc_ich edac_core >> acpi_power_meter ipmi_msghandler mac_hid 8021q garp mrp stp llc >> bonding nfsd auth_rpcgss nfs_acl lp lockd grace parport sunrpc autofs4 >> btrfs xor raid6_pq mlx4_en ptp pps_core hid_generic i2c_algo_bit ttm >> drm_kms_helper usbhid syscopyarea sysfillrect sysimgblt fb_sys_fops >> hid mlx4_core hpsa psmouse drm pata_acpi bnx2 devlink >> scsi_transport_sas fjes >> [Wed Aug 30 14:17:04 2017] CPU: 18 PID: 471071 Comm: vsftpd Tainted: G >> I 4.9.44-040944-generic #201708161731 >> [Wed Aug 30 14:17:04 2017] Hardware name: HP ProLiant DL360 G6, BIOS >> P64 08/16/2015 >> [Wed Aug 30 14:17:04 2017] task: ffff9268331bc080 task.stack: ffffab4a96988000 >> [Wed Aug 30 14:17:04 2017] RIP: 0010:[<ffffffffc09c44f7>] >> [<ffffffffc09c44f7>] send_request+0xa27/0xab0 [libceph] >> [Wed Aug 30 14:17:04 2017] RSP: 0018:ffffab4a9698b8e8 EFLAGS: 00010293 >> [Wed Aug 30 14:17:04 2017] RAX: 0000000000000000 RBX: 0000000000002201 >> RCX: ffff925e48490000 >> [Wed Aug 30 14:17:04 2017] RDX: ffff926242fcf553 RSI: 0000000000001295 >> RDI: 0000000000002201 >> [Wed Aug 30 14:17:04 2017] RBP: ffffab4a9698b958 R08: ffff92685f95c9e0 >> R09: 0000000000000000 >> [Wed Aug 30 14:17:04 2017] R10: 0000000000000000 R11: ffff926842265680 >> R12: ffff92684078c610 >> [Wed Aug 30 14:17:04 2017] R13: 0000000000000001 R14: ffff926242fc608b >> R15: ffff92684078c610 >> [Wed Aug 30 14:17:04 2017] FS: 00007f5b59fd5700(0000) >> GS:ffff92685f940000(0000) knlGS:0000000000000000 >> [Wed Aug 30 14:17:04 2017] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [Wed Aug 30 14:17:04 2017] CR2: 00007fec2bb2f003 CR3: 000000068ef59000 >> CR4: 00000000000006e0 >> [Wed Aug 30 14:17:04 2017] Stack: >> [Wed Aug 30 14:17:04 2017] 01ffffffc09b9cd7 ffff92685f959300 >> 0000000000000067 ffff9268331bc080 >> [Wed Aug 30 14:17:04 2017] ffff926242fc7000 ffff926241a3e000 >> ffff926840215200 00000e8100002201 >> [Wed Aug 30 14:17:04 2017] 0000000000000000 ffff92684078c610 >> ffff926841ddf7c0 0000000000000000 >> [Wed Aug 30 14:17:04 2017] Call Trace: >> [Wed Aug 30 14:17:04 2017] [<ffffffffc09c815a>] >> __submit_request+0x20a/0x2f0 [libceph] >> [Wed Aug 30 14:17:04 2017] [<ffffffffc09c826b>] >> submit_request+0x2b/0x30 [libceph] >> [Wed Aug 30 14:17:05 2017] [<ffffffffc09c8c14>] >> ceph_osdc_writepages+0x104/0x1a0 [libceph] >> [Wed Aug 30 14:17:05 2017] [<ffffffffc0a0f4b1>] >> writepage_nounlock+0x2c1/0x470 [ceph] >> [Wed Aug 30 14:17:05 2017] [<ffffffffa65f120a>] ? page_mkclean+0x6a/0xb0 >> [Wed Aug 30 14:17:05 2017] [<ffffffffa65ef3b0>] ? >> __page_check_address+0x1c0/0x1c0 >> [Wed Aug 30 14:17:05 2017] [<ffffffffc0a11f9c>] >> ceph_update_writeable_page+0xdc/0x4a0 [ceph] >> [Wed Aug 30 14:17:05 2017] [<ffffffffa65a974d>] ? >> pagecache_get_page+0x17d/0x2a0 >> [Wed Aug 30 14:17:05 2017] [<ffffffffc0a123ca>] >> ceph_write_begin+0x6a/0x120 [ceph] >> [Wed Aug 30 14:17:05 2017] [<ffffffffa65a89b8>] >> generic_perform_write+0xc8/0x1c0 >> [Wed Aug 30 14:17:05 2017] [<ffffffffa66592ee>] ? file_update_time+0x5e/0x110 >> [Wed Aug 30 14:17:05 2017] [<ffffffffc0a0c402>] >> ceph_write_iter+0xba2/0xbe0 [ceph] >> [Wed Aug 30 14:17:05 2017] [<ffffffffa6b6238c>] ? release_sock+0x8c/0xa0 >> [Wed Aug 30 14:17:05 2017] [<ffffffffa6bce0b9>] ? tcp_recvmsg+0x4c9/0xb50 >> [Wed Aug 30 14:17:05 2017] [<ffffffffa6b5d65d>] ? sock_recvmsg+0x3d/0x50 >> [Wed Aug 30 14:17:05 2017] [<ffffffffa663ad45>] __vfs_write+0xe5/0x160 >> [Wed Aug 30 14:17:05 2017] [<ffffffffa663bfe5>] vfs_write+0xb5/0x1a0 >> [Wed Aug 30 14:17:05 2017] [<ffffffffa663d465>] SyS_write+0x55/0xc0 >> [Wed Aug 30 14:17:05 2017] [<ffffffffa6c9b9bb>] >> entry_SYSCALL_64_fastpath+0x1e/0xad >> [Wed Aug 30 14:17:05 2017] Code: fb ab e5 e9 de f6 ff ff ba 14 00 00 >> 00 e9 42 f7 ff ff 49 c7 46 08 00 00 00 00 41 c7 46 10 00 00 00 00 49 >> 8d 56 14 e9 6d fb ff ff <0f> 0b 0f 0b be 8f 05 00 00 48 c7 c7 d8 0c 9e >> c0 e8 b4 fb ab e5 >> [Wed Aug 30 14:17:05 2017] RIP [<ffffffffc09c44f7>] >> send_request+0xa27/0xab0 [libceph] >> [Wed Aug 30 14:17:05 2017] RSP <ffffab4a9698b8e8> >> [Wed Aug 30 14:17:05 2017] ---[ end trace 5c55854998e663dc ]--- > > Hi Wyllys, > > Yes, looks like MOSDOp size was miscalculated. > > Could you give some context? Anything before this splat in the kernel > log, ceph version, cephfs configuration -- pools, namespaces, snapshots, > fscache, etc. > > Thanks, > > Ilya -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html