Re: flush devices explicitly on BlueFS::umount to fix #54019

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Satoru,

sounds good, thanks a in advance!

Just in case below is how updates umount() function should look like:

void BlueFS::umount(bool avoid_compact)
{
  dout(1) << __func__ << dendl;

  sync_metadata(avoid_compact);
  if (cct->_conf->bluefs_check_volume_selector_on_umount) {    
    _check_vselector_LNF();
  }
  _close_writer(log.writer);
  log.writer = NULL;
  log.t.clear();
  _flush_bdev();

  vselector.reset(nullptr);

  _stop_alloc();
  nodes.file_map.clear();
  nodes.dir_map.clear();
  super = bluefs_super_t();
  _shutdown_logger();
}


On 11/25/2022 10:16 AM, Satoru Takeuchi wrote:
Hi Igor,

2022年11月24日(木) 1:08 Igor Fedotov <igor.fedotov@xxxxxxxx>:

Hi Cephers,

The following discussion is inspired by https://tracker.ceph.com/issues/54019 "OSD::mkfs: ObjectStore::mkfs failed with error (5) Input/output error".

We had brief discussion in a private chat with Adam a while back but IMO it worth wider audience.

Generally given what we have in the ticket it looks like BlueStore's mkfs might not properly flush device(s) on completion. Points for that are:

- one can see garbage/all zeros when reading from disk after such a failure. And apparently OS cache drop brings valid data to its place.

- setting bluefs_buffered_io = false apparently fixes the issue


Given the BlueFS::umount() implementation below shouldn't we just call flush_bdev() explicitly after _close_writer(log.writer) call?

IIUC both sync_metadata and close_write() do not fully guarantee they've flushed everything so another flush_bdev call might be needed. Anyway it wouldn't harm IMO.

What do you think?

@Satoru - I'm curious if you can try a custom patch with a potential fix as you're able to consistently reproduce the issue?


Thank you for your effort. I'll try it. I'll also ask rook users who hit this problem to try this fix.

Thanks,
Satoru
 


void BlueFS::umount(bool avoid_compact)
{
  dout(1) << __func__ << dendl;

  sync_metadata(avoid_compact);
  if (cct->_conf->bluefs_check_volume_selector_on_umount) {    
    _check_vselector_LNF();
  }
  _close_writer(log.writer);
  log.writer = NULL;
  log.t.cleair();

  vselector.reset(nullptr);
  _stop_alloc();
  nodes.file_map.clear();
  nodes.dir_map.clear();
  super = bluefs_super_t();
  _shutdown_logger();
}


--
Igor Fedotov
Ceph Lead Developer
--
croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web | LinkedIn | Youtube | Twitter

Meet us at the SC22 Conference! Learn more
Technology Fast50 Award Winner by Deloitte!


--
Igor Fedotov
Ceph Lead Developer
--
croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web | LinkedIn | Youtube | Twitter

Meet us at the SC22 Conference! Learn more
Technology Fast50 Award Winner by Deloitte!


_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx

[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux