Re: consequence of losing WAL/DB device with bluestore?

Wido den Hollander <wido@xxxxxxxx> · Mon, 29 Jan 2018 18:18:18 +0100

On 01/29/2018 06:15 PM, David Turner wrote:
+1 for Gregory's response.  With filestore, if you lost a journal SSD 
and followed the steps you outlined, you are leaving yourself open to 
corrupt data.  Any write that was ack'd by the journal, but not flushed 
to the disk would be lost and assumed to be there by the cluster.  With 
a failed journal SSD on filestore, you should have removed all affected 
OSDs before re-adding them with a new journal device.  The same is true 
of Bluestore.

Long story short: If you value your data, never attempt a manual fix of 
OSDs which were giving troubles.

Never attempt a XFS repair with FileStore nor try anything which 
involves fiddling with the bit on the disk.

Wipe the OSD, re-add it and let the backfilling handle it for you.

I've just seen too many cases where people corrupted their cluster by 
trying to be smart.

Wido

Where Bluestore differs from Filestore is if your SSD stops receiving 
writes and can still be read (or any time you can still read from the 
SSD and are swapping it out).  You would be able to flush the journal 
and create new journals on a new SSD for the OSDs.  This is not possible 
with Bluestore as you cannot modify the WAL or RocksDB portions of a 
Bluestore OSD after creation.  If you started with your RocksDB and WAL 
on an SSD, you could not decide to add an NVME later to move the WAL to 
without removing and re-creating the OSDs with the new configuration.

On Mon, Jan 29, 2018 at 10:58 AM Gregory Farnum <gfarnum@xxxxxxxxxx 
<mailto:gfarnum@xxxxxxxxxx>> wrote:

    On Mon, Jan 29, 2018 at 9:37 AM Vladimir Prokofev <v@xxxxxxxxxxx
    <mailto:v@xxxxxxxxxxx>> wrote:

        Hello.

        In short: what are the consequence of loosing external WAL/DB
        device(assuming it’s SSD) in bluestore?

        In comparison with filestore - we used to have an external SSD
        for journaling multiple HDD OSDs. Hardware failure of such a
        device would not be that big of a deal, as we can quickly use
        xfs_repair to initialize a new journal. You don't have to
        redeploy OSDs, just provide them with a new journal device,
        remount XFS, and restart osd process so it can quickly update
        its state. Healthy state can be restored in a matter of minutes.

        That was with filestore.
        Now what's the situation with bluestore?

        What will happen in different scenarios. like having only WAL on
        external device, or DB, or both WAL+DB?
        I kind of assume that loosing DB means losing OSD, and it has to
        be redeployed?

    I'll let the BlueStore guys speak to this more directly, but I
    believe you lose the OSD.

    However, let's be clear: this is not really a different situation
    than with FileStore. You *can* with FileStore fix the xfs filesystem
    and persuade the OSD to start up again by giving it a new journal.
    But this is a *lie* to the OSD about the state of its data and is
    very likely to introduce data loss or inconsistencies. You shouldn't
    do it unless the OSD hosts the only copy of a PG in your cluster.
    -Greg

        What about WAL? Any specific commands to restore it, similar to
        xfs_repair?
        I didn't find any docs regarding this matter, but maybe I'm
        doing it badly, so a link to such doc would be great.
        _______________________________________________
        ceph-users mailing list
        ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
        http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

    _______________________________________________
    ceph-users mailing list
    ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com