Re: Moving bluestore WAL and DB after bluestore creation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I just wanted to make sure that our latest findings reach the OP of this thread. We posted it in a different thread [1] and hope this helps some of you. It is possible to migrate a journal from one partition to another almost without downtime of the OSD. But it's *not* sufficient to dd the journal to the new partition and replace the symlink. The OSD will restart successfully only if the old partition still exists, and you'll find references to it in /proc/fd/<PID>. Removing the old partition will prevent the OSD from starting. You can find details in the provided link [1].

We managed to replace the journals of six 1 TB OSDs residing on the same host within 25 minutes in our production environment.

Note: this only applies if the wal/db already reside on a separate partition.

Currently, I'm looking for a way to extract the journal of an all-in-one OSD (bluestore) into a separate partition, I thought maybe "ceph-objectstore-tool --op dump-journal" could do the trick, but this command doesn't work for me. Has anyone any insights on this?

Regards,
Eugen

[1] http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-April/025930.html


<ronny+ceph-users@xxxxxxxx> -----
 Datum: Fri, 17 Nov 2017 17:04:36 +0100
   Von: Ronny Aasen <ronny+ceph-users@xxxxxxxx>
Betreff: Re: Moving bluestore WAL and DB after bluestore creation
    An: ceph-users@xxxxxxxxxxxxxx

On 16.11.2017 09:45, Loris Cuoghi wrote:
Le Wed, 15 Nov 2017 19:46:48 +0000,
Shawn Edwards <lesser.evil@xxxxxxxxx> a écrit :

On Wed, Nov 15, 2017, 11:07 David Turner <drakonstein@xxxxxxxxx>
wrote:

I'm not going to lie.  This makes me dislike Bluestore quite a
bit.  Using multiple OSDs to an SSD journal allowed for you to
monitor the write durability of the SSD and replace it without
having to out and re-add all of the OSDs on the device.  Having to
now out and backfill back onto the HDDs is awful and would have
made a time when I realized that 20 journal SSDs all ran low on
writes at the same time nearly impossible to recover from.

Flushing journals, replacing SSDs, and bringing it all back online
was a slick process.  Formatting the HDDs and backfilling back onto
the same disks sounds like a big regression.  A process to migrate
the WAL and DB onto the HDD and then back off to a new device would
be very helpful.

On Wed, Nov 15, 2017 at 10:51 AM Mario Giammarco
<mgiammarco@xxxxxxxxx> wrote:

It seems it is not possible. I recreated the OSD

2017-11-12 17:44 GMT+01:00 Shawn Edwards <lesser.evil@xxxxxxxxx>:

I've created some Bluestore OSD with all data (wal, db, and data)
all on the same rotating disk.  I would like to now move the wal
and db onto an nvme disk.  Is that possible without re-creating
the OSD?

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

This.  Exactly this.  Not being able to move the .db and .wal data on
and off the main storage disk on Bluestore is a regression.

Hello,

What stops you from dd'ing the DB/WAL's partitions on another disk and
updating the symlinks in the OSD's mount point under /var/lib/ceph/osd?


this probably works when you deployed bluestore with partitions, but if you did not create partitions for block.db on orginal bluestore creation there is no block.db symlink, db and wal are mixed into the block partition and not easy to extract.  also just dd the block device may not help if you want to change the size of the db partition. this needs more testing.  probably tools can be created in the future for resizing  db and wal partitions, and for extracting db data from block into a separate block.db partition.

dd block.db would probably work when you need to replace a worn out ssd drive. but not so much if you want to deploy separate block.db from a bluestore made without block.db


kind regards
Ronny Aasen





_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
Eugen Block                             voice   : +49-40-559 51 75
NDE Netzdesign und -entwicklung AG      fax     : +49-40-559 51 77
Postfach 61 03 15
D-22423 Hamburg                         e-mail  : eblock@xxxxxx

         Vorsitzende des Aufsichtsrates: Angelika Mozdzen
           Sitz und Registergericht: Hamburg, HRB 90934
                   Vorstand: Jens-U. Mozdzen
                    USt-IdNr. DE 814 013 983


--
Eugen Block                             voice   : +49-40-559 51 75
NDE Netzdesign und -entwicklung AG      fax     : +49-40-559 51 77
Postfach 61 03 15
D-22423 Hamburg                         e-mail  : eblock@xxxxxx

        Vorsitzende des Aufsichtsrates: Angelika Mozdzen
          Sitz und Registergericht: Hamburg, HRB 90934
                  Vorstand: Jens-U. Mozdzen
                   USt-IdNr. DE 814 013 983

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux