Re: MDS: what's the purpose of using LogEvent with empty metablob?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi, Yan:
I agree with the idea that log event can be used to reconstruct cache
when crash happens. But master can reconstruct its cache by replaying
its EUpdate logevent. The ESlaveUpdate::OP_COMMIT log event seems to
have nothing to do with cache of master, it's on slave. Besides, that
log event on slave cannot alway help to construct cache after crash.

Suppose a scenario that slave submits a ESlaveUpdate::OP_COMMIT log
event, and sends OP_COMMITTED message to master. Because there is no
mechanism to prevent slave from trimming ESlaveUpdate::OP_COMMIT log,
so it is possible that both master and slave crash in a situation that
master haven't received the OP_COMMITTED message and slave have
trimmed its log. After both mds are restarted, in the resolve stage,
slave doesn't know it had a "uncommitted slave op" before crashing,
because the logevent has been trimmed. Meanwhile master knows there is
a "uncommitted master op" from replaying its EUpdate log event. In
current implementation, master will not resend a OP_FINISH to slave
for this op, it just wait message from slave. However, slave will
never send a OP_COMMITTED message to master. What surprised me is that
that op in master will finally be committed! With more investigation,
I find maybe that is achived by a "coincidence". Because although
slave has no infomation about uncommit_slave ops, it will alway send a
MMDSResolve to master in `MDCache::send_subtree_resolves()`, and
master will always clean up proper ops(e.g. the op we are talking
about) when receiving MMDSResolve. For a little more specific, since
master has remove slave from its umaster.slaves in
`MDCache::handle_mds_failure()`, so when it receives MMDSResove
message, the condition for starting clean up process is always
satisfied. (I haven't found out why master can always trigger
`handle_mds_failure`, but I think the current information for this
discussion is enough.)

So from the supposed scenario, we can find that even with a
ESlaveUpdate::OP_COMMIT log event, slave sometimes can still not
reconstruct its cache after crash. Discarding that log event seems ok,
because master has log for uncommitted request, so the request won't
miss even after crash.

If this the right direction, I'd like to do more work about the
ECommited log event in master, since I think it's unnecessary too. If
not, genuinely hoping you can give more explanations about it. Thanks!

Sincerely
-Xinying

Yan, Zheng <ukernel@xxxxxxxxx> 于2020年4月15日周三 上午10:08写道:
>
> On Wed, Apr 15, 2020 at 9:40 AM Xinying Song <songxinying.ftd@xxxxxxxxx> wrote:
> >
> > Hi, Greg:
> > Thanks for your reply!
> > I think master can always know if a request has been finished or not
> > no matter whether
> > there is a Commit-logevent, because it has written a EUpdate logevent
> > that records the
> > unfinished request.
> >
> > Of course, we need to do commit, in which we clean up mdcache and
> > trigger journal trim,
> > but it seems we don't need to write a logevent. We can do commit just in memory.
> >
> > For example, if we remove writing a ESlaveUpdate::OP_COMMIT logevent on slave,
> > when crash happens, master will know there is an unfinished request
> > either by replaying
> > its early logged EUpdate or reading from its cache, so it resends
>
> log event can be trimmed, cache get lost if master crashed
>
>
> > OP_FINISH to slave,
> > then everything will go on.
> > Similarly, if we remove writing a ECommitted logevent on master, when
> > crash happens,
> > master still knows there is an unfinished request and it will restart
> > the process from the
> > step of sending OP_FINISH to slave.
> >
> > What do you think?
> >
> > Sincerely
> > -Xinying
> >
> > Gregory Farnum <gfarnum@xxxxxxxxxx> 于2020年4月15日周三 上午2:16写道:
> > >
> > > On Sun, Apr 12, 2020 at 5:19 AM Xinying Song <songxinying.ftd@xxxxxxxxx> wrote:
> > > >
> > > > Hi, cephers:
> > > > What's the purpose of using LogEvent with empty metablob?
> > > > For example in link/unlink operation cross two active mds,
> > > > when slave receives OP_FINISH it will write an ESlaveUpdate::OP_COMMIT
> > > > to the journal, then
> > > > send OP_COMMITTED to master. When master receives OP_COMMITTED it will
> > > > write an ECommitted to the journal then allow previously logged
> > > > journal to be trimmed.
> > > >
> > > > Why are these two logevents necessary?
> > > > I guess they are originally used for a scene that crashes happen,
> > > > but in my opinion it seems not necessary. For example,
> > > > if cash happens, after failed mds are brought up again, in resolve
> > > > stage, master will resend OP_FINISH to slave, then things will
> > > > continue as expected.
> > >
> > > I don't remember the details of these transactions off the top of my
> > > head, but it sounds like you just answered your question: how would
> > > the master know it needs to tell the slave things are over or not, if
> > > it doesn't commit that it told the slave things are over?
> > > If we don't commit something indicating a finish, we'd need to
> > > remember the transaction forever, which would be bad.
> > > -Greg
> > >
> > > >
> > > > Could anyone give some tips on this doubt?
> > > >
> > > > Sincerely thanks!
> > > > _______________________________________________
> > > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> > > >
> > >
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux