Re: Standby-replay mds: 10.2.2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Nov 14, 2016 at 11:35 PM, Goncalo Borges
<goncalo.borges@xxxxxxxxxxxxx> wrote:
> Hi John...
>
> Thanks for replying.
>
> Some of the requested input is inline.
>
> Cheers
>
> Goncalo
>
>
>>>
>>>
>>> We are currently undergoing an infrastructure migration. One of the first
>>> machines to go through this migration process is our standby-replay mds.
>>> We
>>> are running 10.2.2. My plan is to:
>>
>> Is the 10.2.2 here a typo?  What's the current version that you're
>> upgrading to 10.2.2 from?
>
>
> There is no typo. We are not planning to upgrade (for now) but simply
> redeploy the standby-replay mds server with the same version we currently
> have everywhere 10.2.2. So, it is not an upgrade but a simple redeployment
> in a different infrastructure.
>
>
>>
>>> - Shutdown the standby-replay mds
>>> - Re install it in 10.2.2 in a different host, reusing the same IP, keys
>>> and
>>> configurations.
>>
>> Any particular reason for keeping the same IP?  In general you don't
>> need to worry about that at all: I'd usually just delete the old MDS
>> entirely and create a new one, only keeping the ceph.conf section that
>> configures your standby replay options.
>
>
> It is just easier for us to reuse the same hostname and IP.
>
>
>>
>>> - Start the mds service
>>>
>>> I wasn't thinking this was problematic until I read:
>>> http://tracker.ceph.com/issues/17466
>>>
>>> The issue mentioned above was started when the site admin added a new
>>> mds.
>>> He also did an (unintended) upgrade of the mds(es) from 10.2.1 to 10.2.3
>>> but
>>> I am not sure if this is the reason of the problem. His mons started to
>>> fail
>>> because they got an invalid fscid, and the reason is some incoherent
>>> ordering of rank and fscid between the constructor and the struct.
>>
>> The actual issue (we think) was that the message decode was getting
>> junk value for fscid when the beacon was sent by an older MDS due to a
>> missing default initialisation, and then that the MDSMonitor was
>> failing to validate that.
>>
>> This code path was only hit in cases where standby_for_rank was set,
>> so for that particular symptom you should be okay if you just don't
>> set standby_for_rank at all (if you have one MDS, your standby replay
>> daemon will always pick up that rank).
>
>
> This is our configuration for mds(es):
>
> [mds.rccephmds]
> host = rccephmds
> mds standby replay = true
>
> [mds.rccephmds2]
> host = rccephmds2
> mds standby_for_rank = rccephmds
> mds standby replay = true
>
> At the time we deployed these servers, I set up 'standby_for_rank' because
> my understanding was that we had to specify the mds rank we would the
> standby-replay mds to follow (replay its journal and keep a warm cash).
>
> From you comment, I understand that:
> - My current config has the potential to trigger the issue mention above;
> - However, since I only have one active mds, this config is unnecessary: the
> standby-replay mds will start replaying the journal of that (single) active
> mds rank. So, if I simple comment the 'standby_for_rank' config,  i would be
> safe and out of the problematic code.
>
> Can you just give a last confirmation word if my conclusions are correct?

Actually your current standby_for_rank is probably being ignored,
because that setting has to be an integer.  To pass the name of
another mds you'd use "standby_for_name".

If you're not upgrading anything and just moving an MDS daemon then I
don't think you have anything to worry about.  I'd remove that
standby_for_rank line anyway though.

John

> Thanks for the help (as always)
>
> Cheers
> Goncalo
>
>
>
>
>
>> John
>>
>>> I just want to be sure that I won't hit a similar issue:
>>> - In what exact circumstances is this problem triggered?
>>> - Is it triggered when you add a brand new standby-replay mds (new IP,
>>> new
>>> key)? I am hopping that in my case, I shouldn't be affected.
>>>
>>> TIA
>>> Goncalo
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>
> --
> Goncalo Borges
> Research Computing
> ARC Centre of Excellence for Particle Physics at the Terascale
> School of Physics A28 | University of Sydney, NSW  2006
> T: +61 2 93511937
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux