On 11/25/2012 01:51 AM, Sage Weil wrote: > On Sun, 25 Nov 2012, Yan, Zheng wrote: >> On 11/25/2012 12:21 AM, Sage Weil wrote: >>> On Sat, 24 Nov 2012, Yan, Zheng wrote: >>>> Hi, Sage >>>> >>>> I found this fix is still not enough because we don't know if a dentry has been >>>> successfully deleted from the directory fragment after MDS restart. There are several >>>> options to fix this, which one do you like? >>>> >>>> 1. Make OSD ignore the TMAP_RM commands if it can't the find items. >>> >>> This is certainly the simplest. Since TMAP is effectively deprecated and >>> everyone should be using the omap stuff anyway, it gets my vote. >>> >>> Speaking of, we should transition the MDS to use the omap operations soon >>> as well, since TMAP is horribly inefficient for large directories. Also, >>> the rmkey method in that case does not have this problem. >> >> I can do this, it should be easy. >> >>> >>> Let's ignore enoent on TMAP_RM for bobtail, and transition to omap in the >>> next release... do you want to test the patch that drops the !key_exists >>> check? >> >> I will modify the code and test it. but I think it is hiding bugs ;) >> >> Another issue of tmap is that we may get different results when comparing two dentry_key_t >> and their string representation. So the dentries in tmap can be out of order, setkey method >> may add duplicated dentry and rmkey method may fail to find existing dentry. Since tmap is >> so buggy, let's switch the MDS to use the omap ASAP. > > Yeah. I think the question is what is the minimal set of fixes we should > do to the TMAP implementation for bobtail before changing to omap. The > dentry_key_t ordering fix you posted the other day and ignoring the rm > failures should catch these, right? Yes, that bug was caught by the !key_exists check. Actually the patch I previously sent is buggy too, it breaks CDir::lookup. I have a patch that sorts dentries in CDir::_commit_full and CDir::_commit_partial. It survives over night fsstress test, but change is ugly and relatively large. I will send it out tomorrow. Regards Yan, Zheng > > sage > > >> >> Regards >> Yan, Zheng >> >> >>> >>> Thanks! >>> sage >>> >>> >>>> 2. Add a new item deletion command for tmap, the new command doesn't make whole tmap >>>> update operation fail if it can't the item. When committing directory fragments, >>>> use the new command for null dentries that were added by MDS log replay. >>>> >>>> 3. When committing a directory fragment, Re-fetch it if it contains dirty+null dentries >>>> that were added by MDS log replay. Find already deleted items and mark corresponding >>>> dentries as new. >>>> >>>> Regards >>>> Yan, Zheng >>>> >>>> On 11/23/2012 12:52 AM, Yan, Zheng wrote: >>>>> From: "Yan, Zheng" <zheng.z.yan@xxxxxxxxx> >>>>> >>>>> When a null dentry is encountered, CDir::_commit_partial() adds >>>>> a OSD_TMAP_RM command to delete the dentry. But if the dentry is >>>>> new, the osd will not find the dentry when handling the command >>>>> and the tmap update operation will fail totally. >>>>> >>>>> This patch also makes sure dentries are properly marked as new >>>>> when preparing new dentries and exporting dentries. >>>>> >>>>> Signed-off-by: Yan, Zheng <zheng.z.yan@xxxxxxxxx> >>>>> --- >>>>> src/mds/CDentry.h | 2 ++ >>>>> src/mds/CDir.cc | 11 ++++++++--- >>>>> src/mds/CDir.h | 2 +- >>>>> src/mds/MDCache.cc | 9 ++++++--- >>>>> src/mds/Server.cc | 3 +++ >>>>> 5 files changed, 20 insertions(+), 7 deletions(-) >>>>> >>>>> diff --git a/src/mds/CDentry.h b/src/mds/CDentry.h >>>>> index 480e562..5755c55 100644 >>>>> --- a/src/mds/CDentry.h >>>>> +++ b/src/mds/CDentry.h >>>>> @@ -347,6 +347,8 @@ public: >>>>> // twiddle >>>>> state = 0; >>>>> state_set(CDentry::STATE_AUTH); >>>>> + if (nstate & STATE_NEW) >>>>> + mark_new(); >>>>> if (nstate & STATE_DIRTY) >>>>> _mark_dirty(ls); >>>>> if (!replica_map.empty()) >>>>> diff --git a/src/mds/CDir.cc b/src/mds/CDir.cc >>>>> index c5220ed..411d864 100644 >>>>> --- a/src/mds/CDir.cc >>>>> +++ b/src/mds/CDir.cc >>>>> @@ -1696,7 +1696,7 @@ class C_Dir_Committed : public Context { >>>>> public: >>>>> C_Dir_Committed(CDir *d, version_t v, version_t lrv) : dir(d), version(v), last_renamed_version(lrv) { } >>>>> void finish(int r) { >>>>> - dir->_committed(version, last_renamed_version); >>>>> + dir->_committed(version, last_renamed_version, r); >>>>> } >>>>> }; >>>>> >>>>> @@ -1802,6 +1802,10 @@ CDir::map_t::iterator CDir::_commit_partial(ObjectOperation& m, >>>>> continue; // skip clean dentries >>>>> >>>>> if (dn->get_linkage()->is_null()) { >>>>> + if (dn->is_new()) { >>>>> + dn->mark_clean(); >>>>> + continue; >>>>> + } >>>>> dout(10) << " rm " << dn->name << " " << *dn << dendl; >>>>> finalbl.append(CEPH_OSD_TMAP_RM); >>>>> dn->key().encode(finalbl); >>>>> @@ -1997,10 +2001,11 @@ void CDir::_commit(version_t want) >>>>> * >>>>> * @param v version i just committed >>>>> */ >>>>> -void CDir::_committed(version_t v, version_t lrv) >>>>> +void CDir::_committed(version_t v, version_t lrv, int ret) >>>>> { >>>>> - dout(10) << "_committed v " << v << " (last renamed " << lrv << ") on " << *this << dendl; >>>>> + dout(10) << "_committed ret " << ret << " v " << v << " (last renamed " << lrv << ") on " << *this << dendl; >>>>> assert(is_auth()); >>>>> + assert(ret == 0); >>>>> >>>>> bool stray = inode->is_stray(); >>>>> >>>>> diff --git a/src/mds/CDir.h b/src/mds/CDir.h >>>>> index 2222418..274e38b 100644 >>>>> --- a/src/mds/CDir.h >>>>> +++ b/src/mds/CDir.h >>>>> @@ -487,7 +487,7 @@ private: >>>>> unsigned max_write_size=-1, >>>>> map_t::iterator last_committed_dn=map_t::iterator()); >>>>> void _encode_dentry(CDentry *dn, bufferlist& bl, const set<snapid_t> *snaps); >>>>> - void _committed(version_t v, version_t last_renamed_version); >>>>> + void _committed(version_t v, version_t last_renamed_version, int ret); >>>>> void wait_for_commit(Context *c, version_t v=0); >>>>> >>>>> // -- dirtyness -- >>>>> diff --git a/src/mds/MDCache.cc b/src/mds/MDCache.cc >>>>> index f8b1c8f..e69a49f 100644 >>>>> --- a/src/mds/MDCache.cc >>>>> +++ b/src/mds/MDCache.cc >>>>> @@ -657,12 +657,15 @@ CDentry *MDCache::get_or_create_stray_dentry(CInode *in) >>>>> CDir *straydir = strayi->get_dirfrag(fg); >>>>> assert(straydir); >>>>> CDentry *straydn = straydir->lookup(straydname); >>>>> - if (!straydn) { >>>>> + >>>>> + if (!straydn) >>>>> straydn = straydir->add_null_dentry(straydname); >>>>> - straydn->mark_new(); >>>>> - } else >>>>> + else >>>>> assert(straydn->get_projected_linkage()->is_null()); >>>>> >>>>> + if (!straydn->is_dirty()) >>>>> + straydn->mark_new(); >>>>> + >>>>> return straydn; >>>>> } >>>>> >>>>> diff --git a/src/mds/Server.cc b/src/mds/Server.cc >>>>> index ec0d5d5..228fede 100644 >>>>> --- a/src/mds/Server.cc >>>>> +++ b/src/mds/Server.cc >>>>> @@ -1685,6 +1685,9 @@ CDentry* Server::prepare_null_dentry(MDRequest *mdr, CDir *dir, const string& dn >>>>> } >>>>> } >>>>> >>>>> + if (!dn->is_dirty()) >>>>> + dn->mark_new(); >>>>> + >>>>> return dn; >>>>> } >>>>> >>>>> >>>> >>>> >> >> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html