Re: MDS: what's the purpose of using LogEvent with empty metablob?

"Yan, Zheng" <ukernel@xxxxxxxxx> · Fri, 17 Apr 2020 15:10:55 +0800

On Fri, Apr 17, 2020 at 10:23 AM Xinying Song <songxinying.ftd@xxxxxxxxx> wrote:
>
> Hi, Yan:
> I agree with the idea that log event can be used to reconstruct cache
> when crash happens. But master can reconstruct its cache by replaying
> its EUpdate logevent. The ESlaveUpdate::OP_COMMIT log event seems to
> have nothing to do with cache of master, it's on slave. Besides, that
> log event on slave cannot alway help to construct cache after crash.
>
> Suppose a scenario that slave submits a ESlaveUpdate::OP_COMMIT log
> event, and sends OP_COMMITTED message to master. Because there is no
> mechanism to prevent slave from trimming ESlaveUpdate::OP_COMMIT log,
> so it is possible that both master and slave crash in a situation that
> master haven't received the OP_COMMITTED message and slave have
> trimmed its log. After both mds are restarted, in the resolve stage,
> slave doesn't know it had a "uncommitted slave op" before crashing,
> because the logevent has been trimmed. Meanwhile master knows there is
> a "uncommitted master op" from replaying its EUpdate log event. In
> current implementation, master will not resend a OP_FINISH to slave
> for this op, it just wait message from slave. However, slave will
> never send a OP_COMMITTED message to master. What surprised me is that
> that op in master will finally be committed! With more investigation,
> I find maybe that is achived by a "coincidence".

not coincidence. it's by desigin.

> Because although
> slave has no infomation about uncommit_slave ops, it will alway send a
> MMDSResolve to master in `MDCache::send_subtree_resolves()`, and
> master will always clean up proper ops(e.g. the op we are talking
> about) when receiving MMDSResolve. For a little more specific, since
> master has remove slave from its umaster.slaves in
> `MDCache::handle_mds_failure()`, so when it receives MMDSResove
> message, the condition for starting clean up process is always
> satisfied. (I haven't found out why master can always trigger
> `handle_mds_failure`, but I think the current information for this
> discussion is enough.)
>
> So from the supposed scenario, we can find that even with a
> ESlaveUpdate::OP_COMMIT log event, slave sometimes can still not
> reconstruct its cache after crash. Discarding that log event seems ok,
> because master has log for uncommitted request, so the request won't
> miss even after crash.

It's not about reconstructing cache. It's for two phrash commit
protocol. For example, if both master and slave crash after a
operation finishes. At the time mater crash, master has already trim
corresponding EUpdate event. but at the time slave crash, it hasn't
trim any log event. when slave restarts, it does not know slave
updates in its log are commited or rollbacked (because there is no
slave commit log event).  So slave asks master what it should do. But
master has lost information about these updates. (In current
implemention, master will ask slave to rollback)

>
> If this the right direction, I'd like to do more work about the
> ECommited log event in master, since I think it's unnecessary too. If
> not, genuinely hoping you can give more explanations about it. Thanks!
>
ECommited for master is less important,  but it's still usefull.
1. Without ECommit. When mds restart, it needs to track all EUpdate
with slaves until cluster is fully health.
2. it helps debugging.

> Sincerely
> -Xinying
>
> Yan, Zheng <ukernel@xxxxxxxxx> 于2020年4月15日周三 上午10:08写道：
> >
> > On Wed, Apr 15, 2020 at 9:40 AM Xinying Song <songxinying.ftd@xxxxxxxxx> wrote:
> > >
> > > Hi, Greg:
> > > Thanks for your reply!
> > > I think master can always know if a request has been finished or not
> > > no matter whether
> > > there is a Commit-logevent, because it has written a EUpdate logevent
> > > that records the
> > > unfinished request.
> > >
> > > Of course, we need to do commit, in which we clean up mdcache and
> > > trigger journal trim,
> > > but it seems we don't need to write a logevent. We can do commit just in memory.
> > >
> > > For example, if we remove writing a ESlaveUpdate::OP_COMMIT logevent on slave,
> > > when crash happens, master will know there is an unfinished request
> > > either by replaying
> > > its early logged EUpdate or reading from its cache, so it resends
> >
> > log event can be trimmed, cache get lost if master crashed
> >
> >
> > > OP_FINISH to slave,
> > > then everything will go on.
> > > Similarly, if we remove writing a ECommitted logevent on master, when
> > > crash happens,
> > > master still knows there is an unfinished request and it will restart
> > > the process from the
> > > step of sending OP_FINISH to slave.
> > >
> > > What do you think?
> > >
> > > Sincerely
> > > -Xinying
> > >
> > > Gregory Farnum <gfarnum@xxxxxxxxxx> 于2020年4月15日周三 上午2:16写道：
> > > >
> > > > On Sun, Apr 12, 2020 at 5:19 AM Xinying Song <songxinying.ftd@xxxxxxxxx> wrote:
> > > > >
> > > > > Hi, cephers:
> > > > > What's the purpose of using LogEvent with empty metablob?
> > > > > For example in link/unlink operation cross two active mds,
> > > > > when slave receives OP_FINISH it will write an ESlaveUpdate::OP_COMMIT
> > > > > to the journal, then
> > > > > send OP_COMMITTED to master. When master receives OP_COMMITTED it will
> > > > > write an ECommitted to the journal then allow previously logged
> > > > > journal to be trimmed.
> > > > >
> > > > > Why are these two logevents necessary?
> > > > > I guess they are originally used for a scene that crashes happen,
> > > > > but in my opinion it seems not necessary. For example,
> > > > > if cash happens, after failed mds are brought up again, in resolve
> > > > > stage, master will resend OP_FINISH to slave, then things will
> > > > > continue as expected.
> > > >
> > > > I don't remember the details of these transactions off the top of my
> > > > head, but it sounds like you just answered your question: how would
> > > > the master know it needs to tell the slave things are over or not, if
> > > > it doesn't commit that it told the slave things are over?
> > > > If we don't commit something indicating a finish, we'd need to
> > > > remember the transaction forever, which would be bad.
> > > > -Greg
> > > >
> > > > >
> > > > > Could anyone give some tips on this doubt?
> > > > >
> > > > > Sincerely thanks!
> > > > > _______________________________________________
> > > > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> > > > >
> > > >
> > > _______________________________________________
> > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx