Re: is the rbd mirror journal replayed on primary after a crash?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear all,


replying to my own question ;-)


this document explains the rbd mirroring / journaling process more in details: https://pad.ceph.com/p/I-rbd_mirroring


especially this part:
on startup, replay journal from flush position
Store journal metadata in journal header, to be more general
  • flush position
  • per-zone flush positions
pointers to positions in the journal (object, offset) 
- one for each reader so we can tell how far we can trim
- store trim pos in primary and secondary zones, so despite loss of primary dc we can tell who's most up to date
=> so apparently there is one pointer to position in the journal for each secondary images (journal reader) and also importantly one for the primary image (normally journal writer, but also reader during open / crash recovery)
this apparently confirms that clients on the primary are not only writing to the journal (to support replication on secondary) but also actively reading from it after a crash to replay the latest IO's that were missing on primary image.


  • on open, replay recent journal operations
  • periodically update a journal position pointer in the rbd image header (to limit replays on open)

  • If a split-brain event is detected by the rbd-mirror daemon, it will not attempt to mirror the affected image until corrected.

    cheers
    Francois Scheurer




    --


    EveryWare AG
    François Scheurer
    Senior Systems Engineer
    Zurlindenstrasse 52a
    CH-8003 Zürich

    tel: +41 44 466 60 00
    fax: +41 44 466 60 10
    mail: francois.scheurer@xxxxxxxxxxxx
    web: http://www.everyware.ch

    From: Scheurer François <francois.scheurer@xxxxxxxxxxxx>
    Sent: Tuesday, October 3, 2023 4:38:07 PM
    To: dillaman@xxxxxxxxxx; ceph-users@xxxxxxx
    Subject: [ceph-users] is the rbd mirror journal replayed on primary after a crash?
     

    Hello



    Short question regarding journal-based rbd mirroring.


    IO path with journaling w/o cache:

    a. Create an event to describe the update
    b. Asynchronously append event to journal object
    c. Asynchronously update image once event is safe
    d. Complete IO to client once update is safe


    [cf. https://events.static.linuxfound.org/sites/events/files/slides/Disaster%20Recovery%20and%20Ceph%20Block%20Storage-%20Introducing%20Multi-Site%20Mirroring_0.pdf]


    If a client crashes between b. and c., is there a mechanism to replay the IO from the journal on the primary image?

    If not, then the primary and secondary images would get out-of-sync (because of the extra write(s) on secondary) and subsequent writes to the primary would corrupt the secondary. Is that correct?



    Cheers

    Francois Scheurer





    --


    EveryWare AG
    François Scheurer
    Senior Systems Engineer
    Zurlindenstrasse 52a
    CH-8003 Zürich

    tel: +41 44 466 60 00
    fax: +41 44 466 60 10
    mail: francois.scheurer@xxxxxxxxxxxx
    web: http://www.everyware.ch

    Attachment: smime.p7s
    Description: S/MIME cryptographic signature

    _______________________________________________
    ceph-users mailing list -- ceph-users@xxxxxxx
    To unsubscribe send an email to ceph-users-leave@xxxxxxx
    

    [Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


      Powered by Linux