Re: Fwd: Re: Fwd: Latest firefly: osd not joining cluster after re-creation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Let me re-CC the list as this may be worth for the archives.

On 10/23/2014 04:19 PM, Andrey Korolyov wrote:
Doing off-list post again.

So I was inaccurate in an initial bug description:
- mkfs goes just well
- on first start OSD is crashing with ABRT and trace from previous
message, changing fsid before in the mon store
- on next start it refuses to join due to fsid mismatch, not crashing any more.

On Thu, Oct 23, 2014 at 5:56 PM, Andrey Korolyov <andrey@xxxxxxx> wrote:
It is not so easy.. When I added fsid under selected osd` section and
reformatted the store/journal, it aborted at start in
FileStore::_do_transaction (see attach). On next launch, fsid in the
mon store for this OSD magically changes to the something else and I
am kicking again same doorstep (if I shut down osd process, recreate
journal with new fsid inserted in fsid or recreate entire filestore
too, it will abort, otherwise simply not join due to *next* mismatch).
As far as I can see problem is in behavior of legacy clusters which
are inherited fsid from filesystem created by third-party, not as a
result of ceph-deploy work, so it is not fixed at all after such an
update. Any suggestions?

I'm not sure what you mean by 'changing fsid in the mon store', but I suspect you have a few misconceptions about 'fsid' and the 'osd uuid'.

The error you have below, regarding the osd fsid, refers to the osd's uuid, which is passed to '--mkfs' using '--osd-uuid X'. 'X' is also the uuid you would pass when adding the osd to the monitors using 'ceph osd create <uuid>'.

Then there's the cluster 'fsid', which refers to the cluster. This 'fsid' is kept in the monmap and is used to identify the cluster the monitors belong to and to allow clients (such as the osd) to correctly contact the monitors of the cluster they too belong to.

Changing the 'fsid' option in ceph.conf results in changing the perceived value the clients and daemons have of the cluster fsid. If this value is different from the monmap's you're bound to have trouble. If you only change the 'fsid' option in the 'osd' section of ceph.conf, you're basically telling the osds that they belong to a different cluster, which will probably cause issues when they contact the monitors to obtain the monmap during mkfs.

What you clearly want is to remove the contents of the osd data directory, generate a uuid 'X', run 'ceph osd create X', save the value it will return (it will be used as the OSD's id) and then run ceph-osd --mkfs with --osd-uuid X.

Also, I don't believe that the 'clashing' message is a bug. IMO we should assume that it's the operator's responsibility to remove the data if it's no longer of any use, instead of just assuming what the operator may have meant when running mkfs repeatedly over a given osd store.

Hope this helps.

  -Joao


Trace is attached if someone is interested in it.

On Thu, Oct 23, 2014 at 5:25 PM, Andrey Korolyov <andrey@xxxxxxx> wrote:
Sorry, I see the problem.

osd.0 10.6.0.1:6800/32051 clashes with existing osd: different fsid
(ours: d0aec02e-8513-40f1-bf34-22ec44f68466 ; theirs:
16cbb1f8-e896-42cd-863c-bcbad710b4ea). Anyway it is clearly a bug and
fsid should be silently discarded there if OSD contains no epochs
itself.


--
Joao Eduardo Luis
Software Engineer | http://inktank.com | http://ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux