Re: Ceph User Teething Problems

Christian Balzer <chibi@xxxxxxx> · Thu, 5 Mar 2015 23:52:27 +0900

On Thu, 5 Mar 2015 07:46:50 -0700 Robert LeBlanc wrote:

> David,
> 
> You will need to up the limit of open files in the linux system. Check
> /etc/security/limits.conf. it is explained somewhere in the docs and the
> autostart scripts 'fixes' the issue for most people. When I did a manual
> deploy for the same reasons you are, I ran into this too.
> 
It fixes this for most "normal" use cases indeed.

The folks at CERN and other huge installations would probably not concur
with that sentiment. ^o^

Aside from /etc/security/limits.conf I found an initscript like this:
---
# cat /etc/initscript 
#
ulimit -Hn 131072
ulimit -Sn 65536

# Execute the program.
eval exec "$4"

---

quite helpful, as it gets parsed before the PAM stuff.

Christian

> Robert LeBlanc
> 
> Sent from a mobile device please excuse any typos.
> On Mar 5, 2015 3:14 AM, "Datatone Lists" <lists@xxxxxxxxxxxxxx> wrote:
> 
> >
> > Thank you all for such wonderful feedback.
> >
> > Thank you to John Spray for putting me on the right track. I now see
> > that the cephfs aspect of the project is being de-emphasised, so that
> > the manual deployment instructions tell how to set up the object store,
> > and then the cephfs is a separate issue that needs to be explicitly set
> > up and configured in its own right. So that explains why the cephfs
> > pools are not created by default, and why the required cephfs pools are
> > now referred to, not as 'data' and 'metadata', but 'cepfs_data' and
> > 'cephfs_metadata'. I have created these pools, and created a new cephfs
> > filesystem, and I can mount it without problem.
> >
> > This confirms my suspicion that the manual deployment pages are in need
> > of review and revision. They still refer to three default pools. I am
> > happy that this section should deal with the object store setup only,
> > but I still think that the osd part is a bit confused and confusing,
> > particularly with respect to what is done on which machine. It would
> > then be useful to say something like "this completes the configuration
> > of the basic store. If you wish to use cephfs, you must set up a
> > metadata server, appropriate pools, and a cephfs filesystem. (See
> > http://...)".
> >
> > I was not trying to be smart or obscure when I made a brief and
> > apparently dismissive reference to ceph-deploy. I railed against it and
> > the demise of mkcephfs on this list at the point that mkcephfs was
> > discontinued in the releases. That caused a few supportive responses at
> > the time, so I know that I'm not alone. I did not wish to trawl over
> > those arguments again unnecessarily.
> >
> > There is a principle that is being missed. The 'ceph' code contains
> > everything required to set up and operate a ceph cluster. There should
> > be documentation detailing how this is done.
> >
> > 'Ceph-deploy' is a separate thing. It is one of several tools that
> > promise to make setting things up easy. However, my resistance is based
> > on two factors. If I recall correctly, it is one of those projects in
> > which the configuration needs to know what 'distribution' is being
> > used. (Presumably, this is to try to deduce where various things are
> > located). So if one is not using one of these 'distributions', one is
> > stuffed right from the start. Secondly, the challenge that we are
> > trying to overcome is learning what the various ceph components need,
> > and how they need to be set up and configured. I don't think that the
> > "don't worry your pretty little head about that, we have a natty tool
> > to do it for you" approach is particularly useful.
> >
> > So I am not knocking ceph-deploy, Travis, it is just that I do not
> > believe that it is relevant or useful to me at this point in time.
> >
> > I see that Lionel Bouton seems to share my views here.
> >
> > In general, the ceph documentation (in my humble opinion) needs to be
> > draughted with a keen eye on the required scope. Deal with ceph; don't
> > let it get contaminated with 'ceph-deploy', 'upstart', 'systemd', or
> > anything else that is not actually part of ceph.
> >
> > As an example, once you have configured your osd, you start it with:
> >
> > ceph-osd -i {osd-number}
> >
> > It is as simple as that!
> >
> > If it is required to start the osd automatically, then that will be
> > done using sysvinit, upstart, systemd, or whatever else is being used
> > to bring the system up in the first place. It is unnecessary and
> > confusing to try to second-guess the environment in which ceph may be
> > being used, and contaminate the documentation with such details.
> > (Having said that, I see no problem with adding separate, helpful,
> > sections such as "Suggestions for starting using 'upstart'", or
> > "Suggestions for starting using 'systemd'").
> >
> > So I would reiterate the point that the really important documentation
> > is probably quite simple for an expert to produce. Just spell out what
> > each component needs in terms of keys, access to keys, files, and so
> > on. Spell out how to set everything up. Also how to change things after
> > the event, so that 'trial and error' does not have to contain really
> > expensive errors. Once we understand the fundamentals, getting fancy
> > and efficient is a completely separate further goal, and is not really
> > a responsibility of core ceph development.
> >
> > I have an inexplicable emotional desire to see ceph working well with
> > btrfs, which I like very much and have been using since the very early
> > days. Despite all the 'not ready for production' warnings, I adopted it
> > with enthusiasm, and have never had cause to regret it, and only once
> > or twice experienced a failure that was painful to me. However, as I
> > have experimented with ceph over the years, it has been very clear that
> > ceph seems to be the most ruthless stress test for it, and it has
> > always broken quite quickly (I also used xfs for comparison). I have
> > seen evidence of much work going into btrfs in the kernel development
> > now that the lead developer has moved from Oracle to, I think,
> > Facebook.
> >
> > I now share the view that I think Robert LeBlanc has, that maybe btrfs
> > will now stand the ceph test.
> >
> > Thanks, Lincoln Bryant, for confirming that I can increase the size of
> > pools in line with increasing osd numbers. I felt that this had to be
> > the case, otherwise the 'scalable' claim becomes a bit limited.
> >
> > Returning from these digressions to my own experience; I set up my
> > cephfs file system as illuminated by John Spray. I mounted it and
> > started to rsync a multi-terabyte filesystem to it. This is my test, if
> > cephfs handles this without grinding to a snails pace or failing, I
> > will be ready to start to commit my data to it. My osd disk lights
> > started to flash and flicker and a comforting sound of drive activity
> > issued forth. I checked the osd logs, and to my dismay, there were
> > crash reports in them all. However, a closer look revealed that I am
> > getting the "too many open files" messages that precede the failures.
> >
> > I can see that this is not an osd failure, but a resource limit issue.
> >
> > I completely acknowledge that I must now RTFM, but I will ask whether
> > anybody can give any guidance, based on experience, with respect to
> > this issue.
> >
> > Thank you again for all for the previous prompt and invaluable advice
> > and information.
> >
> > David
> >
> >
> > On Wed, 4 Mar 2015 20:27:51 +0000
> > Datatone Lists <lists@xxxxxxxxxxxxxx> wrote:
> >
> > > I have been following ceph for a long time. I have yet to put it into
> > > service, and I keep coming back as btrfs improves and ceph reaches
> > > higher version numbers.
> > >
> > > I am now trying ceph 0.93 and kernel 4.0-rc1.
> > >
> > > Q1) Is it still considered that btrfs is not robust enough, and that
> > > xfs should be used instead? [I am trying with btrfs].
> > >
> > > I followed the manual deployment instructions on the web site
> > > (http://ceph.com/docs/master/install/manual-deployment/) and I
> > > managed to get a monitor and several osds running and apparently
> > > working. The instructions fizzle out without explaining how to set
> > > up mds. I went back to mkcephfs and got things set up that way. The
> > > mds starts.
> > >
> > > [Please don't mention ceph-deploy]
> > >
> > > The first thing that I noticed is that (whether I set up mon and osds
> > > by following the manual deployment, or using mkcephfs), the correct
> > > default pools were not created.
> > >
> > > bash-4.3# ceph osd lspools
> > > 0 rbd,
> > > bash-4.3#
> > >
> > >  I get only 'rbd' created automatically. I deleted this pool, and
> > >  re-created data, metadata and rbd manually. When doing this, I had
> > > to juggle with the pg- num in order to avoid the 'too many pgs for
> > > osd'. I have three osds running at the moment, but intend to add to
> > > these when I have some experience of things working reliably. I am
> > > puzzled, because I seem to have to set the pg-num for the pool to a
> > > number that makes (N-pools x pg-num)/N-osds come to the right kind of
> > > number. So this implies that I can't really expand a set of pools by
> > > adding osds at a later date.
> > >
> > > Q2) Is there any obvious reason why my default pools are not getting
> > > created automatically as expected?
> > >
> > > Q3) Can pg-num be modified for a pool later? (If the number of osds
> > > is increased dramatically).
> > >
> > > Finally, when I try to mount cephfs, I get a mount 5 error.
> > >
> > > "A mount 5 error typically occurs if a MDS server is laggy or if it
> > > crashed. Ensure at least one MDS is up and running, and the cluster
> > > is active + healthy".
> > >
> > > My mds is running, but its log is not terribly active:
> > >
> > > 2015-03-04 17:47:43.177349 7f42da2c47c0  0 ceph version 0.93
> > > (bebf8e9a830d998eeaab55f86bb256d4360dd3c4), process ceph-mds, pid
> > > 4110 2015-03-04 17:47:43.182716 7f42da2c47c0 -1 mds.-1.0
> > > log_to_monitors {default=true}
> > >
> > > (This is all there is in the log).
> > >
> > > I think that a key indicator of the problem must be this from the
> > > monitor log:
> > >
> > > 2015-03-04 16:53:20.715132 7f3cd0014700  1
> > > mon.ceph-mon-00@0(leader).mds e1 warning, MDS mds.?
> > > [2001:8b0:xxxx:5fb3:xxxx:1fff:xxxx:9054]:6800/4036 up but filesystem
> > > disabled
> > >
> > > (I have added the 'xxxx' sections to obscure my ip address)
> > >
> > > Q4) Can you give me an idea of what is wrong that causes the mds to
> > > not play properly?
> > >
> > > I think that there are some typos on the manual deployment pages, for
> > > example:
> > >
> > > ceph-osd id={osd-num}
> > >
> > > This is not right. As far as I am aware it should be:
> > >
> > > ceph-osd -i {osd-num}
> > >
> > > An observation. In principle, setting things up manually is not all
> > > that complicated, provided that clear and unambiguous instructions
> > > are provided. This simple piece of documentation is very important.
> > > My view is that the existing manual deployment instructions gets a
> > > bit confused and confusing when it gets to the osd setup, and the mds
> > > setup is completely absent.
> > >
> > > For someone who knows, this would be a fairly simple and fairly quick
> > > operation to review and revise this part of the documentation. I
> > > suspect that this part suffers from being really obvious stuff to the
> > > well initiated. For those of us closer to the start, this forms the
> > > ends of the threads that have to be picked up before the journey can
> > > be made.
> > >
> > > Very best regards,
> > > David
> > > _______________________________________________
> > > ceph-users mailing list
> > > ceph-users@xxxxxxxxxxxxxx
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com