Hi, I'm experiencing the same issue as outlined in this post: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-September/013330.html I have also deployed this jewel cluster using ceph-deploy. This is the message I see at boot (happens for all drives, on all OSD nodes): [ 92.938882] XFS (sdi1): Mounting V5 Filesystem [ 93.065393] XFS (sdi1): Ending clean mount [ 93.175299] attempt to access beyond end of device [ 93.175304] sdi1: rw=0, want=19134412768, limit=19134412767 and again while the cluster is in operation: [429280.254400] attempt to access beyond end of device [429280.254412] sdi1: rw=0, want=19134412768, limit=19134412767 Eventually, there is a kernel oops (see below for full details). This happens for all drives on all OSD nodes (eventually), so it's consistent at least. Similar to the original post, this RH article has some relevant info: https://access.redhat.com/solutions/2833571 The article suggests looking at the following values: Error message disk size (EMDS) = "limit" value in error message * 512, Current device size (CDS) = `cat /proc/partitions | grep sdi1 | awk '{print $3}'` * 1024 Filesystem size (FSS) = blocks * bsize (from xfs_info) # xfs_info /dev/sdi1 | grep data | grep blocks data = bsize=4096 blocks=2391801595, imaxpct=5 I end up with these values: EMDS = 19134412767 * 512 = 9796819336704 CDS = 9567206383 * 1024 = 9796819336192 (512 bytes less than EMDS) FSS = 2391801595 * 4096 = 9796819333120 (3072 bytes less than CDS) FSS < CDS so that's fine, but EMDS != CDS. Apparently this shouldn't be the case, however these devices have not been renamed and this is 100% reproducible upon reinstallation, so I'm not sure why this is the case. The drives are 10TB 512e drives, so have a logical sector size of 512, and a physical sector size of 4096: # blockdev --getsz /dev/sdi 19134414848 # blockdev --getsz /dev/sdi1 19134412767 # blockdev --getss /dev/sdi 512 # blockdev --getpbsz /dev/sdi 4096 # blockdev --getbsz /dev/sdi 4096 I'm not sure if that's relevant, but thought it might be worth a mention. (FSS + 4096 would exceed EMDS, whereas FSS + 512 would not). My question would be (as with the OP) - can these errors be ignored? Given the oops, I would think not? Has anybody else experienced this issue? Could it be related to the mkfs options used by ceph-disk in ceph-deploy? I didn't change these, so it used the defaults of: /usr/sbin/mkfs -t xfs -f -i size=2048 -- /dev/sdi1 Any pointers for how to debug it further and/or fix it? Cheers, Marcus. [435339.965817] ------------[ cut here ]------------ [435339.965874] WARNING: at fs/xfs/xfs_aops.c:1244 xfs_vm_releasepage+0xcb/0x100 [xfs]() [435339.965876] Modules linked in: vfat fat uas usb_storage mpt3sas mpt2sas raid_class scsi_transport_sas mptctl mptbase iptable_filter dell_rbu team_mode_loadbalance team rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx5_ib ib_core intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ipmi_devintf iTCO_wdt iTCO_vendor_support mxm_wmi dcdbas pcspkr ipmi_ssif sb_edac edac_core sg mei_me mei lpc_ich shpchp ipmi_si ipmi_msghandler wmi acpi_power_meter nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit [435339.965942] crct10dif_pclmul crct10dif_common drm_kms_helper crc32c_intel syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm bnx2x ahci libahci mlx5_core i2c_core libata mdio ptp megaraid_sas nvme pps_core libcrc32c fjes dm_mirror dm_region_hash dm_log dm_mod [435339.965991] CPU: 8 PID: 223 Comm: kswapd0 Not tainted 3.10.0-514.10.2.el7.x86_64 #1 [435339.965993] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.3.4 11/08/2016 [435339.965994] 0000000000000000 000000006ea9561d ffff881ffc2c7aa0 ffffffff816863ef [435339.965998] ffff881ffc2c7ad8 ffffffff81085940 ffffea00015d4e20 ffffea00015d4e00 [435339.966000] ffff880f4d7c5af8 ffff881ffc2c7da0 ffffea00015d4e00 ffff881ffc2c7ae8 [435339.966003] Call Trace: [435339.966010] [<ffffffff816863ef>] dump_stack+0x19/0x1b [435339.966015] [<ffffffff81085940>] warn_slowpath_common+0x70/0xb0 [435339.966018] [<ffffffff81085a8a>] warn_slowpath_null+0x1a/0x20 [435339.966060] [<ffffffffa03be56b>] xfs_vm_releasepage+0xcb/0x100 [xfs] [435339.966120] [<ffffffff81180662>] try_to_release_page+0x32/0x50 [435339.966128] [<ffffffff811965e6>] shrink_active_list+0x3d6/0x3e0 [435339.966133] [<ffffffff811969e1>] shrink_lruvec+0x3f1/0x770 [435339.966138] [<ffffffff81196dd6>] shrink_zone+0x76/0x1a0 [435339.966143] [<ffffffff8119807c>] balance_pgdat+0x48c/0x5e0 [435339.966147] [<ffffffff81198343>] kswapd+0x173/0x450 [435339.966155] [<ffffffff810b17d0>] ? wake_up_atomic_t+0x30/0x30 [435339.966158] [<ffffffff811981d0>] ? balance_pgdat+0x5e0/0x5e0 [435339.966161] [<ffffffff810b06ff>] kthread+0xcf/0xe0 [435339.966165] [<ffffffff810b0630>] ? kthread_create_on_node+0x140/0x140 [435339.966170] [<ffffffff81696958>] ret_from_fork+0x58/0x90 [435339.966173] [<ffffffff810b0630>] ? kthread_create_on_node+0x140/0x140 [435339.966175] ---[ end trace 58233bbca77fd5e2 ]--- -- Marcus Furlong _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com