Re: [PATCH 13/39] mds: don't send resolve message between active MDS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 20, 2013 at 7:55 PM, Yan, Zheng <zheng.z.yan@xxxxxxxxx> wrote:
> On 03/21/2013 05:56 AM, Gregory Farnum wrote:
>> On Sun, Mar 17, 2013 at 7:51 AM, Yan, Zheng <zheng.z.yan@xxxxxxxxx> wrote:
>>> From: "Yan, Zheng" <zheng.z.yan@xxxxxxxxx>
>>>
>>> When MDS cluster is resolving, current behavior is sending subtree resolve
>>> message to all other MDS and waiting for all other MDS' resolve message.
>>> The problem is that active MDS can have diffent subtree map due to rename.
>>> Besides gathering active MDS's resolve messages are also racy. The only
>>> function for these messages is disambiguate other MDS' import. We can
>>> replace it by import finish notification.
>>>
>>> Signed-off-by: Yan, Zheng <zheng.z.yan@xxxxxxxxx>
>>> ---
>>>  src/mds/MDCache.cc  | 12 +++++++++---
>>>  src/mds/Migrator.cc | 25 +++++++++++++++++++++++--
>>>  src/mds/Migrator.h  |  3 ++-
>>>  3 files changed, 34 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/src/mds/MDCache.cc b/src/mds/MDCache.cc
>>> index c455a20..73c1d59 100644
>>> --- a/src/mds/MDCache.cc
>>> +++ b/src/mds/MDCache.cc
>>> @@ -2517,7 +2517,8 @@ void MDCache::send_subtree_resolves()
>>>         ++p) {
>>>      if (*p == mds->whoami)
>>>        continue;
>>> -    resolves[*p] = new MMDSResolve;
>>> +    if (mds->is_resolve() || mds->mdsmap->is_resolve(*p))
>>> +      resolves[*p] = new MMDSResolve;
>>>    }
>>>
>>>    // known
>>> @@ -2837,7 +2838,7 @@ void MDCache::handle_resolve(MMDSResolve *m)
>>>           migrator->import_reverse(dir);
>>>         } else {
>>>           dout(7) << "ambiguous import succeeded on " << *dir << dendl;
>>> -         migrator->import_finish(dir);
>>> +         migrator->import_finish(dir, true);
>>>         }
>>>         my_ambiguous_imports.erase(p);  // no longer ambiguous.
>>>        }
>>> @@ -3432,7 +3433,12 @@ void MDCache::rejoin_send_rejoins()
>>>         ++p) {
>>>      CDir *dir = p->first;
>>>      assert(dir->is_subtree_root());
>>> -    assert(!dir->is_ambiguous_dir_auth());
>>> +    if (dir->is_ambiguous_dir_auth()) {
>>> +      // exporter is recovering, importer is survivor.
>>
>> The importer has to be the MDS this code is running on, right?
>
> This code is for bystanders. The exporter is recovering, and its resolve message didn't claim
> the subtree. So the export must succeed.

Ah, yep. That's what I get for eyeing just the diff.

>
>>
>>> +      assert(rejoins.count(dir->authority().first));
>>> +      assert(!rejoins.count(dir->authority().second));
>>> +      continue;
>>> +    }
>>>
>>>      // my subtree?
>>>      if (dir->is_auth())
>>> diff --git a/src/mds/Migrator.cc b/src/mds/Migrator.cc
>>> index 5e53803..833df12 100644
>>> --- a/src/mds/Migrator.cc
>>> +++ b/src/mds/Migrator.cc
>>> @@ -2088,6 +2088,23 @@ void Migrator::import_reverse(CDir *dir)
>>>    }
>>>  }
>>>
>>> +void Migrator::import_notify_finish(CDir *dir, set<CDir*>& bounds)
>>> +{
>>> +  dout(7) << "import_notify_finish " << *dir << dendl;
>>> +
>>> +  for (set<int>::iterator p = import_bystanders[dir].begin();
>>> +       p != import_bystanders[dir].end();
>>> +       ++p) {
>>> +    MExportDirNotify *notify =
>>> +      new MExportDirNotify(dir->dirfrag(), false,
>>> +                          pair<int,int>(import_peer[dir->dirfrag()], mds->get_nodeid()),
>>> +                          pair<int,int>(mds->get_nodeid(), CDIR_AUTH_UNKNOWN));
>>
>> I don't think this is quite right — we're notifying them that we've
>> just finished importing data from somebody, right? And so we know that
>> we're the auth node...
>
> Yes. In normal case, exporter notifies the bystanders. But if exporter crashes, the importer notifies
> the bystanders after it confirms ambiguous import succeeds.

Never mind — I had the semantic meaning of these pairs wrong.

Reviewed-by: Greg Farnum <greg@xxxxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux