Hi Cephers,
The following discussion is inspired by https://tracker.ceph.com/issues/54019 "OSD::mkfs: ObjectStore::mkfs failed with error (5) Input/output error".
We had brief discussion in a private chat with Adam a while back but IMO it worth wider audience.
Generally given what we have in the ticket it looks like BlueStore's mkfs might not properly flush device(s) on completion. Points for that are:
- one can see garbage/all zeros when reading from disk after such a failure. And apparently OS cache drop brings valid data to its place.
- setting bluefs_buffered_io = false apparently fixes the issue
Given the BlueFS::umount() implementation below shouldn't we just call flush_bdev() explicitly after _close_writer(log.writer) call?
IIUC both sync_metadata and close_write() do not fully guarantee they've flushed everything so another flush_bdev call might be needed. Anyway it wouldn't harm IMO.
What do you think?
@Satoru - I'm curious if you can try a custom patch with a potential fix as you're able to consistently reproduce the issue?
void BlueFS::umount(bool avoid_compact)
{
dout(1) << __func__ << dendl;
sync_metadata(avoid_compact);
if (cct->_conf->bluefs_check_volume_selector_on_umount) {
_check_vselector_LNF();
}
_close_writer(log.writer);
log.writer = NULL;
log.t.cleair();
vselector.reset(nullptr);
_stop_alloc();
nodes.file_map.clear();
nodes.dir_map.clear();
super = bluefs_super_t();
_shutdown_logger();
}
_______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx