On Tue, Jul 18, 2017 at 6:08 AM, Marcus Furlong <furlongm@xxxxxxxxx> wrote: > On 22 March 2017 at 05:51, Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote: >> On Wed, Mar 22, 2017 at 8:24 AM, Marcus Furlong <furlongm@xxxxxxxxx> >> wrote: >>> Hi, >>> >>> I'm experiencing the same issue as outlined in this post: >>> >>> >>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-September/013330.html >>> >>> I have also deployed this jewel cluster using ceph-deploy. >>> >>> This is the message I see at boot (happens for all drives, on all OSD >>> nodes): >>> >>> [ 92.938882] XFS (sdi1): Mounting V5 Filesystem >>> [ 93.065393] XFS (sdi1): Ending clean mount >>> [ 93.175299] attempt to access beyond end of device >>> [ 93.175304] sdi1: rw=0, want=19134412768, limit=19134412767 >>> >>> and again while the cluster is in operation: >>> >>> [429280.254400] attempt to access beyond end of device >>> [429280.254412] sdi1: rw=0, want=19134412768, limit=19134412767 >>> >> >> We see these as well, and I'm also curious what's causing it. Perhaps >> sgdisk is doing something wrong when creating the ceph-data partition? > > Apologies for reviving an old thread, but I figured out what happened and > never documented it, so I thought an update might be useful. > > The disk layout I've ascertained is as follows: > > sector 0 = protective MBR (or empty) > sectors 1 to 33 = GPT (33 sectors) > sectors 34 to 2047 = free (as confirmed by sgdisk -f -E) > sectors 2048 to 19134414814 (19134412767 sectors: Data Partition 1) > sectors 19134414815 to 19134414847 (33 sectors: GPT backup data) > > And the error: > > [ 92.938882] XFS (sdi1): Mounting V5 Filesystem > [ 93.065393] XFS (sdi1): Ending clean mount > [ 93.175299] attempt to access beyond end of device > [ 93.175304] sdi1: rw=0, want=19134412768, limit=19134412767 > > This shows that the error occurs when trying to access sector 1913441278 of > Partition 1, which we can see from the above, doesn't exist. > > I noticed that the file system size is 3.5KiB less than the size of the > partition, and the XFS block size is 4KiB. > > EMDS = 19134412767 * 512 = 9796819336704 <- actual partition size > CDS = 9567206383 * 1024 = 9796819336192 (512 bytes less than EMDS) <- oddly > /proc/partitions reports 512 bytes less, because it's using 1024 bytes as > the unit > FSS = 2391801595 * 4096 = 9796819333120 (3072 bytes less than CDS) <- > filesystem > > It turns out, if I create a partition that matches the block size of the XFS > filesystem, then the error does not occur. i.e. no error when the filesystem > starts _and_ ends on a partition boundary. > > When this happens, e.g. as follows, then there is no issue. This partition > is 7 sectors smaller than the one referenced above. > > # sgdisk --new=0:2048:19134414807 -- /dev/sdi > Creating new GPT entries. > The operation has completed successfully. > > # sgdisk -p /dev/sdi > Disk /dev/sdf: 19134414848 sectors, 8.9 TiB > Logical sector size: 512 bytes > Disk identifier (GUID): 3E61A8BA-838A-4D7E-BB8E-293972EB45AE > Partition table holds up to 128 entries > First usable sector is 34, last usable sector is 19134414814 > Partitions will be aligned on 2048-sector boundaries > Total free space is 2021 sectors (1010.5 KiB) > > When the end of the partition is not aligned to the 4KiB blocks used by XFS, > the error occurs. This explains why the defaults from parted work correctly, > as the 1MiB "padding" is 4K-aligned. > > This non-alignment happens because ceph-deploy uses sgdisk, and sgdisk seems > to align the start of the partition with 2048-sector boundaries, but _not_ > the end of the partition, when used with the -L parameter. > > The fix was to recreate the partition table, and reduce the unused sectors > down to the max filesystem size: > > https://gist.github.com/furlongm/292aefa930f40dc03f21693d1fc19f35 > > In my testing, I could only reproduce this with XFS, not with other > filesystems. It can be reproduced on smaller XFS filesystems but seems to > take more time. Great work. I've tested (in print mode) and seems to detect things correctly here: /dev/sdz1 OSD ID : 88 Partition size in sectors : 11721043087 Sector size : 512 Partition size in bytes : 6001174060544 XFS block size : 4096 # of XFS blocks : 1465130385 XFS filsystem size : 6001174056960 Unused sectors : 7 Unused bytes (unused sector count * sector size) : 3584 Unused bytes (partition size - filesystem size) : 3584 Filesystem is not correctly aligned to partition boundary :-( systemctl stop ceph-osd@88 umount /dev/sdz1 sgdisk --delete=1 -- /dev/sdz sgdisk --new=1:2048:11721045127 --change-name=1:"ceph data" --partition-guid=1:c0832f78-5d7c-49f7-a133-786424b8b491 --typecode=1:89c57f98-2fe5-4dc0-89c1-f3ad0ceff2be -- /dev/sdz partprobe /dev/sdz xfs_repair /dev/sdz1 sgdisk --typecode=1:4fbd7e29-9d25-41b8-afd0-062c0ceff05d -- /dev/sdz But one thing is still unclear to me. sgdisk is not aligning the end of the partition -- fine. But xfs creates a filesystem that fits within that partition - i.e. the filesystem size is smaller (by 7 sectors) than the partition. So, what exactly is trying to access outside the partition? sdz1: rw=0, want=11721043088, limit=11721043087 Are we sure that there is no filesystem data in those 7 sectors? The attempted access (end-of-filesystem + 8 sectors) would be the first sector of the GPT backup. Have you checked if the backup is uncorrupted? (And those xfs_aops oops which to thought to be unrelated -- did you nevertheless see those disappear after you fixed your partition alignments?) Basically, I'm still wondering if this is all harmless, or if we really do need to realign these partitions. Cheers, Dan _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com