Re: consequence of losing WAL/DB device with bluestore?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 01/29/2018 06:15 PM, David Turner wrote:
+1 for Gregory's response.  With filestore, if you lost a journal SSD and followed the steps you outlined, you are leaving yourself open to corrupt data.  Any write that was ack'd by the journal, but not flushed to the disk would be lost and assumed to be there by the cluster.  With a failed journal SSD on filestore, you should have removed all affected OSDs before re-adding them with a new journal device.  The same is true of Bluestore.


Long story short: If you value your data, never attempt a manual fix of OSDs which were giving troubles.

Never attempt a XFS repair with FileStore nor try anything which involves fiddling with the bit on the disk.

Wipe the OSD, re-add it and let the backfilling handle it for you.

I've just seen too many cases where people corrupted their cluster by trying to be smart.

Wido

Where Bluestore differs from Filestore is if your SSD stops receiving writes and can still be read (or any time you can still read from the SSD and are swapping it out).  You would be able to flush the journal and create new journals on a new SSD for the OSDs.  This is not possible with Bluestore as you cannot modify the WAL or RocksDB portions of a Bluestore OSD after creation.  If you started with your RocksDB and WAL on an SSD, you could not decide to add an NVME later to move the WAL to without removing and re-creating the OSDs with the new configuration.

On Mon, Jan 29, 2018 at 10:58 AM Gregory Farnum <gfarnum@xxxxxxxxxx <mailto:gfarnum@xxxxxxxxxx>> wrote:

    On Mon, Jan 29, 2018 at 9:37 AM Vladimir Prokofev <v@xxxxxxxxxxx
    <mailto:v@xxxxxxxxxxx>> wrote:

        Hello.

        In short: what are the consequence of loosing external WAL/DB
        device(assuming it’s SSD) in bluestore?

        In comparison with filestore - we used to have an external SSD
        for journaling multiple HDD OSDs. Hardware failure of such a
        device would not be that big of a deal, as we can quickly use
        xfs_repair to initialize a new journal. You don't have to
        redeploy OSDs, just provide them with a new journal device,
        remount XFS, and restart osd process so it can quickly update
        its state. Healthy state can be restored in a matter of minutes.

        That was with filestore.
        Now what's the situation with bluestore?

        What will happen in different scenarios. like having only WAL on
        external device, or DB, or both WAL+DB?
        I kind of assume that loosing DB means losing OSD, and it has to
        be redeployed?


    I'll let the BlueStore guys speak to this more directly, but I
    believe you lose the OSD.

    However, let's be clear: this is not really a different situation
    than with FileStore. You *can* with FileStore fix the xfs filesystem
    and persuade the OSD to start up again by giving it a new journal.
    But this is a *lie* to the OSD about the state of its data and is
    very likely to introduce data loss or inconsistencies. You shouldn't
    do it unless the OSD hosts the only copy of a PG in your cluster.
    -Greg

        What about WAL? Any specific commands to restore it, similar to
        xfs_repair?
        I didn't find any docs regarding this matter, but maybe I'm
        doing it badly, so a link to such doc would be great.
        _______________________________________________
        ceph-users mailing list
        ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
        http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

    _______________________________________________
    ceph-users mailing list
    ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux