Re: libceph: get_reply osd2 tid 1459933 data 3248128 > preallocated 131072, skipping

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am So., 16. Mai 2021 um 15:36 Uhr schrieb Ilya Dryomov <idryomov@xxxxxxxxx>:

> On Sun, May 16, 2021 at 12:54 PM Markus Kienast <mark@xxxxxxxxxxxxx>
> wrote:
> >
> > Hi Ilya,
> >
> > unfortunately I can not find any "missing primary copy of ..." error in
> the logs of my 3 OSDs.
> > The NVME disks are also brand new and there is not much traffic on them.
> >
> > The only error keyword I find are those two messages in osd.0 and osd.1
> logs shown below.
> >
> > BTW the error posted before actually concerns osd1. The one I posted was
> copied from somebody elses bug report, which had similar errors. Here are
> my original error messages on LTSP boot:
>
> Hi Markus,
>
> Please don't ever paste log messages from other bug reports again.
> Your email said "I am seeing these messages ..." and I spent a fair
> amount of time staring at the code trying to understand how an issue
> that was fixed several releases ago could resurface.
>
> The numbers in the log message mean specific things.  For example it
> is immediately obvious that
>
>   get_reply osd1 tid 11 data 4164 > preallocated 4096, skipping
>
> is not related to
>
>   get_reply osd2 tid 1459933 data 3248128 > preallocated 131072, skipping
>
> even though they probably look the same to you.
>

Sorry, I was not aware of that.


> > [    10.331119] libceph: mon1 (1)10.101.0.27:6789 session established
> > [    10.331799] libceph: client175444 fsid
> b0f4a188-bd81-11ea-8849-97abe2843f29
> > [    10.336866] libceph: mon0 (1)10.101.0.25:6789 session established
> > [    10.337598] libceph: client175444 fsid
> b0f4a188-bd81-11ea-8849-97abe2843f29
> > [    10.349380] libceph: get_reply osd1 tid 11 data 4164 > preallocated
> > 4096, skipping
>
> Please paste the entire boot log and "rbd info" output for the affected
> image.
>

elias@maas:~$ rbd info squashfs/ltsp-01
rbd image 'ltsp-01':
size 3.5 GiB in 896 objects
order 22 (4 MiB objects)
snapshot_count: 0
id: 23faade1714
block_name_prefix: rbd_data.23faade1714
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
op_features:
flags:
create_timestamp: Mon Jan 11 12:09:22 2021
access_timestamp: Wed Feb 24 10:55:17 2021
modify_timestamp: Mon Jan 11 12:09:22 2021

I don't have the boot log available right now, but you can watch a video of
the boot process right here: https://photos.app.goo.gl/S8PssYu2VAr4CSeg7

It seems to be consistently "tid 11" consistently, while in this video it
was "data 4288" not "data 4164" as above. But the image has been modified
in the meantime, as far as I can recall, so that might be due to that
reason.

>
> >
> > elias@maas:~$ juju ssh ceph-osd/2 sudo zgrep -i error
> /var/log/ceph/ceph-osd.0.log
> > 2021-05-16T08:52:56.872+0000 7f0b262c2d80  4 rocksdb:
>      Options.error_if_exists: 0
> > 2021-05-16T08:52:59.872+0000 7f0b262c2d80  4 rocksdb:
>      Options.error_if_exists: 0
> > 2021-05-16T08:53:00.884+0000 7f0b262c2d80  1 osd.0 8599 warning: got an
> error loading one or more classes: (1) Operation not permitted
> >
> > elias@maas:~$ juju ssh ceph-osd/0 sudo zgrep -i error
> /var/log/ceph/ceph-osd.1.log
> > 2021-05-16T08:49:52.971+0000 7fb6aa68ed80  4 rocksdb:
>      Options.error_if_exists: 0
> > 2021-05-16T08:49:55.979+0000 7fb6aa68ed80  4 rocksdb:
>      Options.error_if_exists: 0
> > 2021-05-16T08:49:56.828+0000 7fb6aa68ed80  1 osd.1 8589 warning: got an
> error loading one or more classes: (1) Operation not permitted
> >
> > How can I find our more about this bug? It keeps coming back every two
> weeks and I need to restart all OSDs to make it go away for another two
> weeks. Can I check "tid 11 data 4164" somehow. I find no documentation,
> what a tid actually is and how I could perform a read test on it.
>
> So *just* restarting the three OSDs you have makes it go away?
>
> What is meant by restarting?  Rebooting the node or simply restarting
> the OSD process?
>

I did reboot all OSD nodes and since the MON and FS nodes run as LXD/juju
instances on them, they were rebooted as well.


> >
> > Another interesting detail is, that the problem does only seem to affect
> booting up from this RBD but not operation per se. The thin clients already
> booted from this RBD continue working.
>
> I take it that the affected image is mapped on multiple nodes?  If so,
> on how many?
>

Currently "squashfs/ltsp-01" is mapped on 4 nodes.
As the pool name indicates, the FS was converted to squashfs and is
therefore mounted read-only, while the underlying dev might actually not be
mounted read-only, as there does not seem to be an option available to
mount RO via /sys/bus/rbd/add_single_major or /sys/bus/rbd/add.

As far as I can tell, the only way to force RO is to map a snapshot instead.


> >
> > All systems run:
> > Ubuntu 20.04.2 LTS
> > Kernel 5.8.0-53-generic
> > ceph version 15.2.8 (bdf3eebcd22d7d0b3dd4d5501bee5bac354d5b55) octopus
> (stable)
> >
> > The cluster has been setup with Ubuntu MAAS/juju, consists of
> > * 1 MAAS server
> > * with 1 virtual LXD juju controller
> > * 3 OSD servers with one 2 TB Nvme SSD each for ceph and a 256 SATA SSD
> for the operating system.
> > * each OSD contains a virtualized LXD MON and an LXD FS server (setup
> through juju, see juju yaml file attached).
>
> Can you describe the client side a bit more?  How many clients do you
> have?  How many of them are active at the same time?
>

Currently, there are only 4 active clients but the system is intended to
being able to sustain 100s of clients. We are using an RBD as boot device
for PXE booted thin clients, you might have heard of the Linux Terminal
Server Project (ltsp.org). We adapted the stack to support booting from RBD.

The clients do not write to the FS/dev, all necessary write operations
happen on a mapped overlayfs.

Currently we have one virtual machine with an ext4 FS on RBD "rbd/ltsp-01",
where we prepare the boot images for LTSP. We generate a squashfs from that
FS and then dd it onto a RBD "squashfs/ltsp-01" and back up the previous
version by moving it to another RBD name.

The initrd mapping process can be reviewed here:
https://github.com/trickkiste/ltsp/blob/feature-boot_method-rbd/debian/ltsp-rbd.initramfs-script


> What exactly is meant by "booting from RBD"?

Does the affected image serve as a golden image?

Or some other image is snapshotted and cloned from before booting?
>

The same RBD is mapped by all clients and the squashfs image on it is
mounted read-only.

>
> What do clients do after they boot?
>
Normal desktop operation, they serve as fat clients with the RBD as their
distributed network boot drive. As said above, all write access is done to
an overlayfs ramdisk.


>
> Thanks,
>
>                 Ilya
>

Thank you for reviewing this issue.

My best regards
Markus
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux