Let me re-CC the list as this may be worth for the archives.
On 10/23/2014 04:19 PM, Andrey Korolyov wrote:
Doing off-list post again.
So I was inaccurate in an initial bug description:
- mkfs goes just well
- on first start OSD is crashing with ABRT and trace from previous
message, changing fsid before in the mon store
- on next start it refuses to join due to fsid mismatch, not crashing any more.
On Thu, Oct 23, 2014 at 5:56 PM, Andrey Korolyov <andrey@xxxxxxx> wrote:
It is not so easy.. When I added fsid under selected osd` section and
reformatted the store/journal, it aborted at start in
FileStore::_do_transaction (see attach). On next launch, fsid in the
mon store for this OSD magically changes to the something else and I
am kicking again same doorstep (if I shut down osd process, recreate
journal with new fsid inserted in fsid or recreate entire filestore
too, it will abort, otherwise simply not join due to *next* mismatch).
As far as I can see problem is in behavior of legacy clusters which
are inherited fsid from filesystem created by third-party, not as a
result of ceph-deploy work, so it is not fixed at all after such an
update. Any suggestions?
I'm not sure what you mean by 'changing fsid in the mon store', but I
suspect you have a few misconceptions about 'fsid' and the 'osd uuid'.
The error you have below, regarding the osd fsid, refers to the osd's
uuid, which is passed to '--mkfs' using '--osd-uuid X'. 'X' is also the
uuid you would pass when adding the osd to the monitors using 'ceph osd
create <uuid>'.
Then there's the cluster 'fsid', which refers to the cluster. This
'fsid' is kept in the monmap and is used to identify the cluster the
monitors belong to and to allow clients (such as the osd) to correctly
contact the monitors of the cluster they too belong to.
Changing the 'fsid' option in ceph.conf results in changing the
perceived value the clients and daemons have of the cluster fsid. If
this value is different from the monmap's you're bound to have trouble.
If you only change the 'fsid' option in the 'osd' section of
ceph.conf, you're basically telling the osds that they belong to a
different cluster, which will probably cause issues when they contact
the monitors to obtain the monmap during mkfs.
What you clearly want is to remove the contents of the osd data
directory, generate a uuid 'X', run 'ceph osd create X', save the value
it will return (it will be used as the OSD's id) and then run ceph-osd
--mkfs with --osd-uuid X.
Also, I don't believe that the 'clashing' message is a bug. IMO we
should assume that it's the operator's responsibility to remove the data
if it's no longer of any use, instead of just assuming what the operator
may have meant when running mkfs repeatedly over a given osd store.
Hope this helps.
-Joao
Trace is attached if someone is interested in it.
On Thu, Oct 23, 2014 at 5:25 PM, Andrey Korolyov <andrey@xxxxxxx> wrote:
Sorry, I see the problem.
osd.0 10.6.0.1:6800/32051 clashes with existing osd: different fsid
(ours: d0aec02e-8513-40f1-bf34-22ec44f68466 ; theirs:
16cbb1f8-e896-42cd-863c-bcbad710b4ea). Anyway it is clearly a bug and
fsid should be silently discarded there if OSD contains no epochs
itself.
--
Joao Eduardo Luis
Software Engineer | http://inktank.com | http://ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com