Re: MDS: what's the purpose of using LogEvent with empty metablob?

Xinying Song <songxinying.ftd@xxxxxxxxx> · Fri, 17 Apr 2020 20:15:56 +0800

Understood! I really appreciate your explanation.

Yan, Zheng <ukernel@xxxxxxxxx> 于2020年4月17日周五 下午3:11写道：
>
> On Fri, Apr 17, 2020 at 10:23 AM Xinying Song <songxinying.ftd@xxxxxxxxx> wrote:
> >
> > Hi, Yan:
> > I agree with the idea that log event can be used to reconstruct cache
> > when crash happens. But master can reconstruct its cache by replaying
> > its EUpdate logevent. The ESlaveUpdate::OP_COMMIT log event seems to
> > have nothing to do with cache of master, it's on slave. Besides, that
> > log event on slave cannot alway help to construct cache after crash.
> >
> > Suppose a scenario that slave submits a ESlaveUpdate::OP_COMMIT log
> > event, and sends OP_COMMITTED message to master. Because there is no
> > mechanism to prevent slave from trimming ESlaveUpdate::OP_COMMIT log,
> > so it is possible that both master and slave crash in a situation that
> > master haven't received the OP_COMMITTED message and slave have
> > trimmed its log. After both mds are restarted, in the resolve stage,
> > slave doesn't know it had a "uncommitted slave op" before crashing,
> > because the logevent has been trimmed. Meanwhile master knows there is
> > a "uncommitted master op" from replaying its EUpdate log event. In
> > current implementation, master will not resend a OP_FINISH to slave
> > for this op, it just wait message from slave. However, slave will
> > never send a OP_COMMITTED message to master. What surprised me is that
> > that op in master will finally be committed! With more investigation,
> > I find maybe that is achived by a "coincidence".
>
> not coincidence. it's by desigin.
>
> > Because although
> > slave has no infomation about uncommit_slave ops, it will alway send a
> > MMDSResolve to master in `MDCache::send_subtree_resolves()`, and
> > master will always clean up proper ops(e.g. the op we are talking
> > about) when receiving MMDSResolve. For a little more specific, since
> > master has remove slave from its umaster.slaves in
> > `MDCache::handle_mds_failure()`, so when it receives MMDSResove
> > message, the condition for starting clean up process is always
> > satisfied. (I haven't found out why master can always trigger
> > `handle_mds_failure`, but I think the current information for this
> > discussion is enough.)
> >
> > So from the supposed scenario, we can find that even with a
> > ESlaveUpdate::OP_COMMIT log event, slave sometimes can still not
> > reconstruct its cache after crash. Discarding that log event seems ok,
> > because master has log for uncommitted request, so the request won't
> > miss even after crash.
>
> It's not about reconstructing cache. It's for two phrash commit
> protocol. For example, if both master and slave crash after a
> operation finishes. At the time mater crash, master has already trim
> corresponding EUpdate event. but at the time slave crash, it hasn't
> trim any log event. when slave restarts, it does not know slave
> updates in its log are commited or rollbacked (because there is no
> slave commit log event).  So slave asks master what it should do. But
> master has lost information about these updates. (In current
> implemention, master will ask slave to rollback)
>
> >
> > If this the right direction, I'd like to do more work about the
> > ECommited log event in master, since I think it's unnecessary too. If
> > not, genuinely hoping you can give more explanations about it. Thanks!
> >
> ECommited for master is less important,  but it's still usefull.
> 1. Without ECommit. When mds restart, it needs to track all EUpdate
> with slaves until cluster is fully health.
> 2. it helps debugging.
>
> > Sincerely
> > -Xinying
> >
> > Yan, Zheng <ukernel@xxxxxxxxx> 于2020年4月15日周三 上午10:08写道：
> > >
> > > On Wed, Apr 15, 2020 at 9:40 AM Xinying Song <songxinying.ftd@xxxxxxxxx> wrote:
> > > >
> > > > Hi, Greg:
> > > > Thanks for your reply!
> > > > I think master can always know if a request has been finished or not
> > > > no matter whether
> > > > there is a Commit-logevent, because it has written a EUpdate logevent
> > > > that records the
> > > > unfinished request.
> > > >
> > > > Of course, we need to do commit, in which we clean up mdcache and
> > > > trigger journal trim,
> > > > but it seems we don't need to write a logevent. We can do commit just in memory.
> > > >
> > > > For example, if we remove writing a ESlaveUpdate::OP_COMMIT logevent on slave,
> > > > when crash happens, master will know there is an unfinished request
> > > > either by replaying
> > > > its early logged EUpdate or reading from its cache, so it resends
> > >
> > > log event can be trimmed, cache get lost if master crashed
> > >
> > >
> > > > OP_FINISH to slave,
> > > > then everything will go on.
> > > > Similarly, if we remove writing a ECommitted logevent on master, when
> > > > crash happens,
> > > > master still knows there is an unfinished request and it will restart
> > > > the process from the
> > > > step of sending OP_FINISH to slave.
> > > >
> > > > What do you think?
> > > >
> > > > Sincerely
> > > > -Xinying
> > > >
> > > > Gregory Farnum <gfarnum@xxxxxxxxxx> 于2020年4月15日周三 上午2:16写道：
> > > > >
> > > > > On Sun, Apr 12, 2020 at 5:19 AM Xinying Song <songxinying.ftd@xxxxxxxxx> wrote:
> > > > > >
> > > > > > Hi, cephers:
> > > > > > What's the purpose of using LogEvent with empty metablob?
> > > > > > For example in link/unlink operation cross two active mds,
> > > > > > when slave receives OP_FINISH it will write an ESlaveUpdate::OP_COMMIT
> > > > > > to the journal, then
> > > > > > send OP_COMMITTED to master. When master receives OP_COMMITTED it will
> > > > > > write an ECommitted to the journal then allow previously logged
> > > > > > journal to be trimmed.
> > > > > >
> > > > > > Why are these two logevents necessary?
> > > > > > I guess they are originally used for a scene that crashes happen,
> > > > > > but in my opinion it seems not necessary. For example,
> > > > > > if cash happens, after failed mds are brought up again, in resolve
> > > > > > stage, master will resend OP_FINISH to slave, then things will
> > > > > > continue as expected.
> > > > >
> > > > > I don't remember the details of these transactions off the top of my
> > > > > head, but it sounds like you just answered your question: how would
> > > > > the master know it needs to tell the slave things are over or not, if
> > > > > it doesn't commit that it told the slave things are over?
> > > > > If we don't commit something indicating a finish, we'd need to
> > > > > remember the transaction forever, which would be bad.
> > > > > -Greg
> > > > >
> > > > > >
> > > > > > Could anyone give some tips on this doubt?
> > > > > >
> > > > > > Sincerely thanks!
> > > > > > _______________________________________________
> > > > > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> > > > > >
> > > > >
> > > > _______________________________________________
> > > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx