Understood! I really appreciate your explanation. Yan, Zheng <ukernel@xxxxxxxxx> 于2020年4月17日周五 下午3:11写道: > > On Fri, Apr 17, 2020 at 10:23 AM Xinying Song <songxinying.ftd@xxxxxxxxx> wrote: > > > > Hi, Yan: > > I agree with the idea that log event can be used to reconstruct cache > > when crash happens. But master can reconstruct its cache by replaying > > its EUpdate logevent. The ESlaveUpdate::OP_COMMIT log event seems to > > have nothing to do with cache of master, it's on slave. Besides, that > > log event on slave cannot alway help to construct cache after crash. > > > > Suppose a scenario that slave submits a ESlaveUpdate::OP_COMMIT log > > event, and sends OP_COMMITTED message to master. Because there is no > > mechanism to prevent slave from trimming ESlaveUpdate::OP_COMMIT log, > > so it is possible that both master and slave crash in a situation that > > master haven't received the OP_COMMITTED message and slave have > > trimmed its log. After both mds are restarted, in the resolve stage, > > slave doesn't know it had a "uncommitted slave op" before crashing, > > because the logevent has been trimmed. Meanwhile master knows there is > > a "uncommitted master op" from replaying its EUpdate log event. In > > current implementation, master will not resend a OP_FINISH to slave > > for this op, it just wait message from slave. However, slave will > > never send a OP_COMMITTED message to master. What surprised me is that > > that op in master will finally be committed! With more investigation, > > I find maybe that is achived by a "coincidence". > > not coincidence. it's by desigin. > > > Because although > > slave has no infomation about uncommit_slave ops, it will alway send a > > MMDSResolve to master in `MDCache::send_subtree_resolves()`, and > > master will always clean up proper ops(e.g. the op we are talking > > about) when receiving MMDSResolve. For a little more specific, since > > master has remove slave from its umaster.slaves in > > `MDCache::handle_mds_failure()`, so when it receives MMDSResove > > message, the condition for starting clean up process is always > > satisfied. (I haven't found out why master can always trigger > > `handle_mds_failure`, but I think the current information for this > > discussion is enough.) > > > > So from the supposed scenario, we can find that even with a > > ESlaveUpdate::OP_COMMIT log event, slave sometimes can still not > > reconstruct its cache after crash. Discarding that log event seems ok, > > because master has log for uncommitted request, so the request won't > > miss even after crash. > > It's not about reconstructing cache. It's for two phrash commit > protocol. For example, if both master and slave crash after a > operation finishes. At the time mater crash, master has already trim > corresponding EUpdate event. but at the time slave crash, it hasn't > trim any log event. when slave restarts, it does not know slave > updates in its log are commited or rollbacked (because there is no > slave commit log event). So slave asks master what it should do. But > master has lost information about these updates. (In current > implemention, master will ask slave to rollback) > > > > > If this the right direction, I'd like to do more work about the > > ECommited log event in master, since I think it's unnecessary too. If > > not, genuinely hoping you can give more explanations about it. Thanks! > > > ECommited for master is less important, but it's still usefull. > 1. Without ECommit. When mds restart, it needs to track all EUpdate > with slaves until cluster is fully health. > 2. it helps debugging. > > > Sincerely > > -Xinying > > > > Yan, Zheng <ukernel@xxxxxxxxx> 于2020年4月15日周三 上午10:08写道: > > > > > > On Wed, Apr 15, 2020 at 9:40 AM Xinying Song <songxinying.ftd@xxxxxxxxx> wrote: > > > > > > > > Hi, Greg: > > > > Thanks for your reply! > > > > I think master can always know if a request has been finished or not > > > > no matter whether > > > > there is a Commit-logevent, because it has written a EUpdate logevent > > > > that records the > > > > unfinished request. > > > > > > > > Of course, we need to do commit, in which we clean up mdcache and > > > > trigger journal trim, > > > > but it seems we don't need to write a logevent. We can do commit just in memory. > > > > > > > > For example, if we remove writing a ESlaveUpdate::OP_COMMIT logevent on slave, > > > > when crash happens, master will know there is an unfinished request > > > > either by replaying > > > > its early logged EUpdate or reading from its cache, so it resends > > > > > > log event can be trimmed, cache get lost if master crashed > > > > > > > > > > OP_FINISH to slave, > > > > then everything will go on. > > > > Similarly, if we remove writing a ECommitted logevent on master, when > > > > crash happens, > > > > master still knows there is an unfinished request and it will restart > > > > the process from the > > > > step of sending OP_FINISH to slave. > > > > > > > > What do you think? > > > > > > > > Sincerely > > > > -Xinying > > > > > > > > Gregory Farnum <gfarnum@xxxxxxxxxx> 于2020年4月15日周三 上午2:16写道: > > > > > > > > > > On Sun, Apr 12, 2020 at 5:19 AM Xinying Song <songxinying.ftd@xxxxxxxxx> wrote: > > > > > > > > > > > > Hi, cephers: > > > > > > What's the purpose of using LogEvent with empty metablob? > > > > > > For example in link/unlink operation cross two active mds, > > > > > > when slave receives OP_FINISH it will write an ESlaveUpdate::OP_COMMIT > > > > > > to the journal, then > > > > > > send OP_COMMITTED to master. When master receives OP_COMMITTED it will > > > > > > write an ECommitted to the journal then allow previously logged > > > > > > journal to be trimmed. > > > > > > > > > > > > Why are these two logevents necessary? > > > > > > I guess they are originally used for a scene that crashes happen, > > > > > > but in my opinion it seems not necessary. For example, > > > > > > if cash happens, after failed mds are brought up again, in resolve > > > > > > stage, master will resend OP_FINISH to slave, then things will > > > > > > continue as expected. > > > > > > > > > > I don't remember the details of these transactions off the top of my > > > > > head, but it sounds like you just answered your question: how would > > > > > the master know it needs to tell the slave things are over or not, if > > > > > it doesn't commit that it told the slave things are over? > > > > > If we don't commit something indicating a finish, we'd need to > > > > > remember the transaction forever, which would be bad. > > > > > -Greg > > > > > > > > > > > > > > > > > Could anyone give some tips on this doubt? > > > > > > > > > > > > Sincerely thanks! > > > > > > _______________________________________________ > > > > > > ceph-users mailing list -- ceph-users@xxxxxxx > > > > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > > > > > > > > > > > > _______________________________________________ > > > > ceph-users mailing list -- ceph-users@xxxxxxx > > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx