Re: Ceph User Teething Problems

Datatone Lists <lists@xxxxxxxxxxxxxx> · Thu, 5 Mar 2015 10:14:11 +0000

Thank you all for such wonderful feedback.

Thank you to John Spray for putting me on the right track. I now see
that the cephfs aspect of the project is being de-emphasised, so that
the manual deployment instructions tell how to set up the object store,
and then the cephfs is a separate issue that needs to be explicitly set
up and configured in its own right. So that explains why the cephfs
pools are not created by default, and why the required cephfs pools are
now referred to, not as 'data' and 'metadata', but 'cepfs_data' and
'cephfs_metadata'. I have created these pools, and created a new cephfs
filesystem, and I can mount it without problem.

This confirms my suspicion that the manual deployment pages are in need
of review and revision. They still refer to three default pools. I am
happy that this section should deal with the object store setup only,
but I still think that the osd part is a bit confused and confusing,
particularly with respect to what is done on which machine. It would
then be useful to say something like "this completes the configuration
of the basic store. If you wish to use cephfs, you must set up a
metadata server, appropriate pools, and a cephfs filesystem. (See
http://...)".

I was not trying to be smart or obscure when I made a brief and
apparently dismissive reference to ceph-deploy. I railed against it and
the demise of mkcephfs on this list at the point that mkcephfs was
discontinued in the releases. That caused a few supportive responses at
the time, so I know that I'm not alone. I did not wish to trawl over
those arguments again unnecessarily.

There is a principle that is being missed. The 'ceph' code contains
everything required to set up and operate a ceph cluster. There should
be documentation detailing how this is done.

'Ceph-deploy' is a separate thing. It is one of several tools that
promise to make setting things up easy. However, my resistance is based
on two factors. If I recall correctly, it is one of those projects in
which the configuration needs to know what 'distribution' is being
used. (Presumably, this is to try to deduce where various things are
located). So if one is not using one of these 'distributions', one is
stuffed right from the start. Secondly, the challenge that we are
trying to overcome is learning what the various ceph components need,
and how they need to be set up and configured. I don't think that the
"don't worry your pretty little head about that, we have a natty tool
to do it for you" approach is particularly useful.

So I am not knocking ceph-deploy, Travis, it is just that I do not
believe that it is relevant or useful to me at this point in time.

I see that Lionel Bouton seems to share my views here.

In general, the ceph documentation (in my humble opinion) needs to be
draughted with a keen eye on the required scope. Deal with ceph; don't
let it get contaminated with 'ceph-deploy', 'upstart', 'systemd', or
anything else that is not actually part of ceph.

As an example, once you have configured your osd, you start it with:

ceph-osd -i {osd-number}

It is as simple as that! 

If it is required to start the osd automatically, then that will be
done using sysvinit, upstart, systemd, or whatever else is being used
to bring the system up in the first place. It is unnecessary and
confusing to try to second-guess the environment in which ceph may be
being used, and contaminate the documentation with such details.
(Having said that, I see no problem with adding separate, helpful,
sections such as "Suggestions for starting using 'upstart'", or
"Suggestions for starting using 'systemd'").

So I would reiterate the point that the really important documentation
is probably quite simple for an expert to produce. Just spell out what
each component needs in terms of keys, access to keys, files, and so
on. Spell out how to set everything up. Also how to change things after
the event, so that 'trial and error' does not have to contain really
expensive errors. Once we understand the fundamentals, getting fancy
and efficient is a completely separate further goal, and is not really
a responsibility of core ceph development.

I have an inexplicable emotional desire to see ceph working well with
btrfs, which I like very much and have been using since the very early
days. Despite all the 'not ready for production' warnings, I adopted it
with enthusiasm, and have never had cause to regret it, and only once
or twice experienced a failure that was painful to me. However, as I
have experimented with ceph over the years, it has been very clear that
ceph seems to be the most ruthless stress test for it, and it has
always broken quite quickly (I also used xfs for comparison). I have
seen evidence of much work going into btrfs in the kernel development
now that the lead developer has moved from Oracle to, I think, Facebook.

I now share the view that I think Robert LeBlanc has, that maybe btrfs
will now stand the ceph test.

Thanks, Lincoln Bryant, for confirming that I can increase the size of
pools in line with increasing osd numbers. I felt that this had to be
the case, otherwise the 'scalable' claim becomes a bit limited.

Returning from these digressions to my own experience; I set up my
cephfs file system as illuminated by John Spray. I mounted it and
started to rsync a multi-terabyte filesystem to it. This is my test, if
cephfs handles this without grinding to a snails pace or failing, I
will be ready to start to commit my data to it. My osd disk lights
started to flash and flicker and a comforting sound of drive activity
issued forth. I checked the osd logs, and to my dismay, there were
crash reports in them all. However, a closer look revealed that I am
getting the "too many open files" messages that precede the failures.

I can see that this is not an osd failure, but a resource limit issue.

I completely acknowledge that I must now RTFM, but I will ask whether
anybody can give any guidance, based on experience, with respect to
this issue.

Thank you again for all for the previous prompt and invaluable advice
and information.

David

On Wed, 4 Mar 2015 20:27:51 +0000
Datatone Lists <lists@xxxxxxxxxxxxxx> wrote:

> I have been following ceph for a long time. I have yet to put it into
> service, and I keep coming back as btrfs improves and ceph reaches
> higher version numbers.
> 
> I am now trying ceph 0.93 and kernel 4.0-rc1.
> 
> Q1) Is it still considered that btrfs is not robust enough, and that
> xfs should be used instead? [I am trying with btrfs].
> 
> I followed the manual deployment instructions on the web site 
> (http://ceph.com/docs/master/install/manual-deployment/) and I managed
> to get a monitor and several osds running and apparently working. The
> instructions fizzle out without explaining how to set up mds. I went
> back to mkcephfs and got things set up that way. The mds starts.
> 
> [Please don't mention ceph-deploy]
> 
> The first thing that I noticed is that (whether I set up mon and osds
> by following the manual deployment, or using mkcephfs), the correct
> default pools were not created.
> 
> bash-4.3# ceph osd lspools
> 0 rbd,
> bash-4.3# 
> 
>  I get only 'rbd' created automatically. I deleted this pool, and
>  re-created data, metadata and rbd manually. When doing this, I had to
>  juggle with the pg- num in order to avoid the 'too many pgs for osd'.
>  I have three osds running at the moment, but intend to add to these
>  when I have some experience of things working reliably. I am puzzled,
>  because I seem to have to set the pg-num for the pool to a number
> that makes (N-pools x pg-num)/N-osds come to the right kind of
> number. So this implies that I can't really expand a set of pools by
> adding osds at a later date. 
> 
> Q2) Is there any obvious reason why my default pools are not getting
> created automatically as expected?
> 
> Q3) Can pg-num be modified for a pool later? (If the number of osds
> is increased dramatically).
> 
> Finally, when I try to mount cephfs, I get a mount 5 error.
> 
> "A mount 5 error typically occurs if a MDS server is laggy or if it
> crashed. Ensure at least one MDS is up and running, and the cluster is
> active + healthy".
> 
> My mds is running, but its log is not terribly active:
> 
> 2015-03-04 17:47:43.177349 7f42da2c47c0  0 ceph version 0.93 
> (bebf8e9a830d998eeaab55f86bb256d4360dd3c4), process ceph-mds, pid 4110
> 2015-03-04 17:47:43.182716 7f42da2c47c0 -1 mds.-1.0 log_to_monitors 
> {default=true}
> 
> (This is all there is in the log).
> 
> I think that a key indicator of the problem must be this from the
> monitor log:
> 
> 2015-03-04 16:53:20.715132 7f3cd0014700  1
> mon.ceph-mon-00@0(leader).mds e1 warning, MDS mds.?
> [2001:8b0:xxxx:5fb3:xxxx:1fff:xxxx:9054]:6800/4036 up but filesystem
> disabled
> 
> (I have added the 'xxxx' sections to obscure my ip address)
> 
> Q4) Can you give me an idea of what is wrong that causes the mds to
> not play properly?
> 
> I think that there are some typos on the manual deployment pages, for
> example:
> 
> ceph-osd id={osd-num}
> 
> This is not right. As far as I am aware it should be:
> 
> ceph-osd -i {osd-num}
> 
> An observation. In principle, setting things up manually is not all
> that complicated, provided that clear and unambiguous instructions are
> provided. This simple piece of documentation is very important. My
> view is that the existing manual deployment instructions gets a bit
> confused and confusing when it gets to the osd setup, and the mds
> setup is completely absent.
> 
> For someone who knows, this would be a fairly simple and fairly quick 
> operation to review and revise this part of the documentation. I
> suspect that this part suffers from being really obvious stuff to the
> well initiated. For those of us closer to the start, this forms the
> ends of the threads that have to be picked up before the journey can
> be made.
> 
> Very best regards,
> David
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com