On Wed, Mar 22, 2017 at 8:24 AM, Marcus Furlong <furlongm@xxxxxxxxx> wrote: > Hi, > > I'm experiencing the same issue as outlined in this post: > > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-September/013330.html > > I have also deployed this jewel cluster using ceph-deploy. > > This is the message I see at boot (happens for all drives, on all OSD nodes): > > [ 92.938882] XFS (sdi1): Mounting V5 Filesystem > [ 93.065393] XFS (sdi1): Ending clean mount > [ 93.175299] attempt to access beyond end of device > [ 93.175304] sdi1: rw=0, want=19134412768, limit=19134412767 > > and again while the cluster is in operation: > > [429280.254400] attempt to access beyond end of device > [429280.254412] sdi1: rw=0, want=19134412768, limit=19134412767 > We see these as well, and I'm also curious what's causing it. Perhaps sgdisk is doing something wrong when creating the ceph-data partition? > Eventually, there is a kernel oops (see below for full details). > We also see that oops, but very very rarely (only twice in the current uptime epoch) and its not correlated with the access beyond end of device errors. Cheers, Dan > This happens for all drives on all OSD nodes (eventually), so it's > consistent at least. > > > Similar to the original post, this RH article has some relevant info: > > https://access.redhat.com/solutions/2833571 > > > The article suggests looking at the following values: > > Error message disk size (EMDS) = "limit" value in error message * 512, > Current device size (CDS) = `cat /proc/partitions | grep sdi1 | awk > '{print $3}'` * 1024 > Filesystem size (FSS) = blocks * bsize (from xfs_info) > > # xfs_info /dev/sdi1 | grep data | grep blocks > data = bsize=4096 blocks=2391801595, imaxpct=5 > > I end up with these values: > > EMDS = 19134412767 * 512 = 9796819336704 > CDS = 9567206383 * 1024 = 9796819336192 (512 bytes less than EMDS) > FSS = 2391801595 * 4096 = 9796819333120 (3072 bytes less than CDS) > > FSS < CDS so that's fine, but EMDS != CDS. Apparently this shouldn't > be the case, however these devices have not been renamed and this is > 100% reproducible upon reinstallation, so I'm not sure why this is the > case. > > > The drives are 10TB 512e drives, so have a logical sector size of 512, > and a physical sector size of 4096: > > # blockdev --getsz /dev/sdi > 19134414848 > # blockdev --getsz /dev/sdi1 > 19134412767 > # blockdev --getss /dev/sdi > 512 > # blockdev --getpbsz /dev/sdi > 4096 > # blockdev --getbsz /dev/sdi > 4096 > > I'm not sure if that's relevant, but thought it might be worth a > mention. (FSS + 4096 would exceed EMDS, whereas FSS + 512 would not). > > My question would be (as with the OP) - can these errors be ignored? > Given the oops, I would think not? > > Has anybody else experienced this issue? > > Could it be related to the mkfs options used by ceph-disk in > ceph-deploy? I didn't change these, so it used the defaults of: > > /usr/sbin/mkfs -t xfs -f -i size=2048 -- /dev/sdi1 > > Any pointers for how to debug it further and/or fix it? > > Cheers, > Marcus. > > > [435339.965817] ------------[ cut here ]------------ > [435339.965874] WARNING: at fs/xfs/xfs_aops.c:1244 > xfs_vm_releasepage+0xcb/0x100 [xfs]() > [435339.965876] Modules linked in: vfat fat uas usb_storage mpt3sas > mpt2sas raid_class scsi_transport_sas mptctl mptbase iptable_filter > dell_rbu team_mode_loadbalance team rpcrdma ib_isert iscsi_target_mod > ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp > scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad > rdma_cm ib_cm iw_cm mlx5_ib ib_core intel_powerclamp coretemp > intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul > ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper > cryptd ipmi_devintf iTCO_wdt iTCO_vendor_support mxm_wmi dcdbas pcspkr > ipmi_ssif sb_edac edac_core sg mei_me mei lpc_ich shpchp ipmi_si > ipmi_msghandler wmi acpi_power_meter nfsd auth_rpcgss nfs_acl lockd > grace sunrpc ip_tables xfs sd_mod crc_t10dif crct10dif_generic mgag200 > i2c_algo_bit > [435339.965942] crct10dif_pclmul crct10dif_common drm_kms_helper > crc32c_intel syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm > bnx2x ahci libahci mlx5_core i2c_core libata mdio ptp megaraid_sas > nvme pps_core libcrc32c fjes dm_mirror dm_region_hash dm_log dm_mod > [435339.965991] CPU: 8 PID: 223 Comm: kswapd0 Not tainted > 3.10.0-514.10.2.el7.x86_64 #1 > [435339.965993] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS > 2.3.4 11/08/2016 > [435339.965994] 0000000000000000 000000006ea9561d ffff881ffc2c7aa0 > ffffffff816863ef > [435339.965998] ffff881ffc2c7ad8 ffffffff81085940 ffffea00015d4e20 > ffffea00015d4e00 > [435339.966000] ffff880f4d7c5af8 ffff881ffc2c7da0 ffffea00015d4e00 > ffff881ffc2c7ae8 > [435339.966003] Call Trace: > [435339.966010] [<ffffffff816863ef>] dump_stack+0x19/0x1b > [435339.966015] [<ffffffff81085940>] warn_slowpath_common+0x70/0xb0 > [435339.966018] [<ffffffff81085a8a>] warn_slowpath_null+0x1a/0x20 > [435339.966060] [<ffffffffa03be56b>] xfs_vm_releasepage+0xcb/0x100 [xfs] > [435339.966120] [<ffffffff81180662>] try_to_release_page+0x32/0x50 > [435339.966128] [<ffffffff811965e6>] shrink_active_list+0x3d6/0x3e0 > [435339.966133] [<ffffffff811969e1>] shrink_lruvec+0x3f1/0x770 > [435339.966138] [<ffffffff81196dd6>] shrink_zone+0x76/0x1a0 > [435339.966143] [<ffffffff8119807c>] balance_pgdat+0x48c/0x5e0 > [435339.966147] [<ffffffff81198343>] kswapd+0x173/0x450 > [435339.966155] [<ffffffff810b17d0>] ? wake_up_atomic_t+0x30/0x30 > [435339.966158] [<ffffffff811981d0>] ? balance_pgdat+0x5e0/0x5e0 > [435339.966161] [<ffffffff810b06ff>] kthread+0xcf/0xe0 > [435339.966165] [<ffffffff810b0630>] ? kthread_create_on_node+0x140/0x140 > [435339.966170] [<ffffffff81696958>] ret_from_fork+0x58/0x90 > [435339.966173] [<ffffffff810b0630>] ? kthread_create_on_node+0x140/0x140 > [435339.966175] ---[ end trace 58233bbca77fd5e2 ]--- > > -- > Marcus Furlong > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com