On Sat, Nov 3, 2018 at 7:28 PM Bryan Henderson <bryanh@xxxxxxxxxxxxxxxx> wrote: > > I had a filesystem rank get damaged when the MDS had an error writing the log > to the OSD. Is damage expected when a log write fails? > > According to log messages, an OSD write failed because the MDS attempted > to write a bigger chunk than the OSD's maximum write size. I can probably > figure out why that happened and fix it, but OSD write failures can happen for > lots of reasons, and I would have expected the MDS just to discard the recent > filesystem updates, issue a log message, and keep going. The user had > presumably not been told those updates were committed. The MDS will go into a damaged state when it sees an unexpected error from an OSD (the key word there is "unexpected", this does not apply to ordinary behaviour such as an OSD going down). In this case, it doesn't mean that the metadata is literally damaged, just that the MDS has encountered a situation that it can't handle, and needs to be stopped until a human being can intervene to sort the situation out. OSD write errors are not usual events: any issues with the underlying storage are expected to be handled by RADOS, and write operations to an unhealthy cluster should block, rather than returning an error. It would not be correct for CephFS to throw away metadata updates in the case of unexpected write errors -- this is a strongly consistent system, so when we can't make progress consistently (i.e. respecting all the ops we've seen in order), then we have to stop. Assuming that the only problem was indeed that the MDS's journaler was attempting to exceed the OSD's maximum write size, then you should find that doing a "ceph mds repaired..." to clear the damaged flag will allow the MDS to start again. I'm guessing that you changed some related settings (like mds_log_segment_size) to get into this situation? Otherwise, an error like this would definitely be a bug. John > > > And how do I repair this now? Is this a job for > > cephfs-journal-tool event recover_dentries > cephfs-journal-tool journal reset > > ? > > This is Jewel. > > -- > Bryan Henderson San Jose, California > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com