Re: Mon crash when client mounts CephFS

Phil Merricks <seffyroff@xxxxxxxxx> · Tue, 15 Jun 2021 17:32:02 -0700

Thanks for the replies folks.

This one was resolved, I wish I could tell you I know what I changed to fix
it, but there were several undocumented changes to the deployment script
I'm using whilst I was distracted by something else.. Tearing down and
redeploying today seems to not be suffering from this particular issue.

I do have a new thing though, less concerning.  I'll start a new thread..

On Tue, 8 Jun 2021 at 12:48, Robert W. Eckert <rob@xxxxxxxxxxxxxxx> wrote:

> When I had issues with the monitors, it was access on the monitor folder
> under /var/lib/ceph/<guid of ceph installation>/mon.<servername>/store.db,
> make sure it is owned by the ceph user.
>
> My issues originated from a hardware issue - the memory needed 1.3 v, but
> the mother board was only reading 1.2 (The memory had the issue, the
> firmware said 1.2v required, the sticker on the side said 1.3).  So I had a
> script that copied the store across and fixed the permissions.
>
> The other thing that helped a lot compared to the crash logs, was to edit
> the unit.run and remove  -rm parameter from the command.  That lets you see
> the podman logs using podman logs <container>  it was  a bit more detailed.
>
> When you do this, you will need to restore that afterwards, and clean up
> the 'cid' and 'pid' files from /run/ceph-<guid>@mon.<server>.service-cid
> and /run/ceph-<guid>@mon.<server>.service-pid
>
> My reference is from Redhat enterprise 8, so things may be a bit different
> on ubuntu.
>
> If you get a message about the store.db files being off,  its easiest to
> stop the working node, copy them over , set the user id/group to ceph and
> start things up.
>
> Rob
>
> -----Original Message-----
> From: Phil Merricks <seffyroff@xxxxxxxxx>
> Sent: Tuesday, June 8, 2021 3:18 PM
> To: ceph-users <ceph-users@xxxxxxx>
> Subject:  Mon crash when client mounts CephFS
>
> Hey folks,
>
> I have deployed a 3 node dev cluster using cephadm.  Deployment went
> smoothly and all seems well.
>
> If I try to mount a CephFS from a client node, 2/3 mons crash however.
> I've begun picking through the logs to see what I can see, but so far
> other than seeing the crash in the log itself, it's unclear what the cause
> of the crash is.
>
> Here's a log. <https://termbin.com/isaz>.  You can see where the crash is
> occurring around the line that begins with "Jun 08 18:56:04 okcomputer
> podman[790987]:"
>
> I would welcome any advice on either what the cause may be, or how I can
> advance the analysis of what's wrong.
>
> Best regards
>
> Phil
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an
> email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx