Re: flush devices explicitly on BlueFS::umount to fix #54019

Satoru Takeuchi <satoru.takeuchi@xxxxxxxxx> · Fri, 25 Nov 2022 16:16:13 +0900

Hi Igor,

2022年11月24日(木) 1:08 Igor Fedotov <igor.fedotov@xxxxxxxx>:

    Hi Cephers,
    The following discussion is inspired by
      https://tracker.ceph.com/issues/54019 "OSD::mkfs:
      ObjectStore::mkfs failed with error (5) Input/output error". 

    We had brief discussion in a private chat with Adam a while back
      but IMO it worth wider audience.

    Generally given what we have in the ticket it looks like
      BlueStore's mkfs might not properly flush device(s) on completion.
      Points for that are:
    - one can see garbage/all zeros when reading from disk after such
      a failure. And apparently OS cache drop brings valid data to its
      place.

    - setting bluefs_buffered_io = false apparently fixes the issue

    Given the BlueFS::umount() implementation below shouldn't we just
    call flush_bdev() explicitly after _close_writer(log.writer) call?

      IIUC both sync_metadata and close_write() do not fully guarantee
      they've flushed everything so another flush_bdev call might be
      needed. Anyway it wouldn't harm IMO. 

    What do you think?

    @Satoru - I'm curious if you can try a custom patch with a
      potential fix as you're able to consistently reproduce the issue?

Thank you for your effort. I'll try it. I'll also ask rook users who hit this problem to try this fix.

Thanks,
Satoru

    void
      BlueFS::umount(bool avoid_compact)

      {

        dout(1) << __func__ << dendl;

        sync_metadata(avoid_compact);

        if (cct->_conf->bluefs_check_volume_selector_on_umount) {

          _check_vselector_LNF();

        }

        _close_writer(log.writer);

        log.writer = NULL;

        log.t.cleair();

        vselector.reset(nullptr);

        _stop_alloc();

        nodes.file_map.clear();

        nodes.dir_map.clear();

        super = bluefs_super_t();

        _shutdown_logger();

      }

    -- 

      Igor Fedotov

      Ceph Lead Developer

      --

      croit GmbH, Freseniusstr. 31h, 81247 Munich

      CEO: Martin Verges - VAT-ID: DE310638492

      Com. register: Amtsgericht Munich HRB 231263

      Web | LinkedIn
      | Youtube | Twitter

      Meet us at the SC22 Conference! Learn more

      Technology
        Fast50 Award Winner by Deloitte!

_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx