Re: Error: journal specified but not allowed by osd backend

David Majchrzak <david@xxxxxxxxxxx> · Fri, 3 Aug 2018 16:26:08 +0200

Thanks Eugen!
I was looking into running all the commands manually, following the docs for add/remove osd but tried ceph-disk first.

I actually made it work by changing the id part in ceph-disk ( it was checking the wrong journal device, which was owned by root:root ). The next part was that I tried re-using an old journal, so I had to create a new one ( parted / sgdisk to set ceph-journal parttype). Could I have just zapped the previous journal?

After that it prepared successfully and starting peering. Unsetting nobackfill let it recover a 4TB HDD in approx 9 hours.

The best part was that I didn't have to backfill twice then, by reusing the osd uuid.
I'll see if I can add to the docs after we have updated to Luminous or Mimic and started using ceph-volume.

Kind Regards

David Majchrzak

On aug 3 2018, at 4:16 pm, Eugen Block <eblock@xxxxxx> wrote:

Hi,

we have a full bluestore cluster and had to deal with read errors on
the SSD for the block.db. Something like this helped us to recreate a
pre-existing OSD without rebalancing, just refilling the PGs. I would
zap the journal device and let it recreate. It's very similar to your
ceph-deploy output, but maybe you get more of it if you run it manually:

ceph-osd [--cluster-uuid <CLUSTER_UUID>] [--osd-objectstore filestore]
--mkfs -i <OSD_ID> --osd-journal <PATH_TO_SSD> --osd-data
/var/lib/ceph/osd/ceph-<OSD_ID>/ --mkjournal --setuser ceph --setgroup
ceph --osd-uuid <OSD_UUID>

Maybe after zapping the journal this will work. At least it would rule
out the old journal as the show-stopper.

Regards,
Eugen

Zitat von David Majchrzak <david@xxxxxxxxxxx>:

Hi!
Trying to replace an OSD on a Jewel cluster (filestore data on HDD +
journal device on SSD).
I've set noout and removed the flapping drive (read errors) and
replaced it with a new one.

I've taken down the osd UUID to be able to prepare the new disk with
the same osd.ID. The journal device is the same as the previous one
(should I delete the partition and recreate it?)
However, running ceph-disk prepare returns:
# ceph-disk -v prepare --cluster-uuid
c51a2683-55dc-4634-9d9d-f0fec9a6f389 --osd-uuid
dc49691a-2950-4028-91ea-742ffc9ed63f --journal-dev --data-dev
--fs-type xfs /dev/sdo /dev/sda8
command: Running command: /usr/bin/ceph-osd --check-allows-journal
-i 0 --log-file $run_dir/$cluster-osd-check.log --cluster ceph
--setuser ceph --setgroup ceph
command: Running command: /usr/bin/ceph-osd --check-wants-journal -i
0 --log-file $run_dir/$cluster-osd-check.log --cluster ceph
--setuser ceph --setgroup ceph
command: Running command: /usr/bin/ceph-osd --check-needs-journal -i
0 --log-file $run_dir/$cluster-osd-check.log --cluster ceph
--setuser ceph --setgroup ceph
Traceback (most recent call last):
File "/usr/sbin/ceph-disk", line 9, in <module>
load_entry_point('ceph-disk==1.0.0', 'console_scripts', 'ceph-disk')()
File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 5371, in run
main(sys.argv[1:])
File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 5322, in main
args.func(args)
File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 1900, in main
Prepare.factory(args).prepare()
File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line
1896, in factory
return PrepareFilestore(args)
File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line
1909, in __init__
self.journal = PrepareJournal(args)
File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line
2221, in __init__
raise Error('journal specified but not allowed by osd backend')
ceph_disk.main.Error: Error: journal specified but not allowed by osd backend

I tried googling first of course. It COULD be that we have set
setuser_match_path globally in ceph.conf (like this bug report:
https://tracker.ceph.com/issues/19642) since the cluster was created
as dumpling a long time ago.
Best practice to fix it? Create [osd.X] configs and set
setuser_match_path in there instead for the old OSDs?
Should I do any other steps preceding this if I want to use the same
osd UUID? I've only stopped ceph-osd@21, removed the physical disk,
inserted new one and tried running prepare.
Kind Regards,
David

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com