On Thu, 5 Mar 2015 07:46:50 -0700 Robert LeBlanc wrote: > David, > > You will need to up the limit of open files in the linux system. Check > /etc/security/limits.conf. it is explained somewhere in the docs and the > autostart scripts 'fixes' the issue for most people. When I did a manual > deploy for the same reasons you are, I ran into this too. > It fixes this for most "normal" use cases indeed. The folks at CERN and other huge installations would probably not concur with that sentiment. ^o^ Aside from /etc/security/limits.conf I found an initscript like this: --- # cat /etc/initscript # ulimit -Hn 131072 ulimit -Sn 65536 # Execute the program. eval exec "$4" --- quite helpful, as it gets parsed before the PAM stuff. Christian > Robert LeBlanc > > Sent from a mobile device please excuse any typos. > On Mar 5, 2015 3:14 AM, "Datatone Lists" <lists@xxxxxxxxxxxxxx> wrote: > > > > > Thank you all for such wonderful feedback. > > > > Thank you to John Spray for putting me on the right track. I now see > > that the cephfs aspect of the project is being de-emphasised, so that > > the manual deployment instructions tell how to set up the object store, > > and then the cephfs is a separate issue that needs to be explicitly set > > up and configured in its own right. So that explains why the cephfs > > pools are not created by default, and why the required cephfs pools are > > now referred to, not as 'data' and 'metadata', but 'cepfs_data' and > > 'cephfs_metadata'. I have created these pools, and created a new cephfs > > filesystem, and I can mount it without problem. > > > > This confirms my suspicion that the manual deployment pages are in need > > of review and revision. They still refer to three default pools. I am > > happy that this section should deal with the object store setup only, > > but I still think that the osd part is a bit confused and confusing, > > particularly with respect to what is done on which machine. It would > > then be useful to say something like "this completes the configuration > > of the basic store. If you wish to use cephfs, you must set up a > > metadata server, appropriate pools, and a cephfs filesystem. (See > > http://...)". > > > > I was not trying to be smart or obscure when I made a brief and > > apparently dismissive reference to ceph-deploy. I railed against it and > > the demise of mkcephfs on this list at the point that mkcephfs was > > discontinued in the releases. That caused a few supportive responses at > > the time, so I know that I'm not alone. I did not wish to trawl over > > those arguments again unnecessarily. > > > > There is a principle that is being missed. The 'ceph' code contains > > everything required to set up and operate a ceph cluster. There should > > be documentation detailing how this is done. > > > > 'Ceph-deploy' is a separate thing. It is one of several tools that > > promise to make setting things up easy. However, my resistance is based > > on two factors. If I recall correctly, it is one of those projects in > > which the configuration needs to know what 'distribution' is being > > used. (Presumably, this is to try to deduce where various things are > > located). So if one is not using one of these 'distributions', one is > > stuffed right from the start. Secondly, the challenge that we are > > trying to overcome is learning what the various ceph components need, > > and how they need to be set up and configured. I don't think that the > > "don't worry your pretty little head about that, we have a natty tool > > to do it for you" approach is particularly useful. > > > > So I am not knocking ceph-deploy, Travis, it is just that I do not > > believe that it is relevant or useful to me at this point in time. > > > > I see that Lionel Bouton seems to share my views here. > > > > In general, the ceph documentation (in my humble opinion) needs to be > > draughted with a keen eye on the required scope. Deal with ceph; don't > > let it get contaminated with 'ceph-deploy', 'upstart', 'systemd', or > > anything else that is not actually part of ceph. > > > > As an example, once you have configured your osd, you start it with: > > > > ceph-osd -i {osd-number} > > > > It is as simple as that! > > > > If it is required to start the osd automatically, then that will be > > done using sysvinit, upstart, systemd, or whatever else is being used > > to bring the system up in the first place. It is unnecessary and > > confusing to try to second-guess the environment in which ceph may be > > being used, and contaminate the documentation with such details. > > (Having said that, I see no problem with adding separate, helpful, > > sections such as "Suggestions for starting using 'upstart'", or > > "Suggestions for starting using 'systemd'"). > > > > So I would reiterate the point that the really important documentation > > is probably quite simple for an expert to produce. Just spell out what > > each component needs in terms of keys, access to keys, files, and so > > on. Spell out how to set everything up. Also how to change things after > > the event, so that 'trial and error' does not have to contain really > > expensive errors. Once we understand the fundamentals, getting fancy > > and efficient is a completely separate further goal, and is not really > > a responsibility of core ceph development. > > > > I have an inexplicable emotional desire to see ceph working well with > > btrfs, which I like very much and have been using since the very early > > days. Despite all the 'not ready for production' warnings, I adopted it > > with enthusiasm, and have never had cause to regret it, and only once > > or twice experienced a failure that was painful to me. However, as I > > have experimented with ceph over the years, it has been very clear that > > ceph seems to be the most ruthless stress test for it, and it has > > always broken quite quickly (I also used xfs for comparison). I have > > seen evidence of much work going into btrfs in the kernel development > > now that the lead developer has moved from Oracle to, I think, > > Facebook. > > > > I now share the view that I think Robert LeBlanc has, that maybe btrfs > > will now stand the ceph test. > > > > Thanks, Lincoln Bryant, for confirming that I can increase the size of > > pools in line with increasing osd numbers. I felt that this had to be > > the case, otherwise the 'scalable' claim becomes a bit limited. > > > > Returning from these digressions to my own experience; I set up my > > cephfs file system as illuminated by John Spray. I mounted it and > > started to rsync a multi-terabyte filesystem to it. This is my test, if > > cephfs handles this without grinding to a snails pace or failing, I > > will be ready to start to commit my data to it. My osd disk lights > > started to flash and flicker and a comforting sound of drive activity > > issued forth. I checked the osd logs, and to my dismay, there were > > crash reports in them all. However, a closer look revealed that I am > > getting the "too many open files" messages that precede the failures. > > > > I can see that this is not an osd failure, but a resource limit issue. > > > > I completely acknowledge that I must now RTFM, but I will ask whether > > anybody can give any guidance, based on experience, with respect to > > this issue. > > > > Thank you again for all for the previous prompt and invaluable advice > > and information. > > > > David > > > > > > On Wed, 4 Mar 2015 20:27:51 +0000 > > Datatone Lists <lists@xxxxxxxxxxxxxx> wrote: > > > > > I have been following ceph for a long time. I have yet to put it into > > > service, and I keep coming back as btrfs improves and ceph reaches > > > higher version numbers. > > > > > > I am now trying ceph 0.93 and kernel 4.0-rc1. > > > > > > Q1) Is it still considered that btrfs is not robust enough, and that > > > xfs should be used instead? [I am trying with btrfs]. > > > > > > I followed the manual deployment instructions on the web site > > > (http://ceph.com/docs/master/install/manual-deployment/) and I > > > managed to get a monitor and several osds running and apparently > > > working. The instructions fizzle out without explaining how to set > > > up mds. I went back to mkcephfs and got things set up that way. The > > > mds starts. > > > > > > [Please don't mention ceph-deploy] > > > > > > The first thing that I noticed is that (whether I set up mon and osds > > > by following the manual deployment, or using mkcephfs), the correct > > > default pools were not created. > > > > > > bash-4.3# ceph osd lspools > > > 0 rbd, > > > bash-4.3# > > > > > > I get only 'rbd' created automatically. I deleted this pool, and > > > re-created data, metadata and rbd manually. When doing this, I had > > > to juggle with the pg- num in order to avoid the 'too many pgs for > > > osd'. I have three osds running at the moment, but intend to add to > > > these when I have some experience of things working reliably. I am > > > puzzled, because I seem to have to set the pg-num for the pool to a > > > number that makes (N-pools x pg-num)/N-osds come to the right kind of > > > number. So this implies that I can't really expand a set of pools by > > > adding osds at a later date. > > > > > > Q2) Is there any obvious reason why my default pools are not getting > > > created automatically as expected? > > > > > > Q3) Can pg-num be modified for a pool later? (If the number of osds > > > is increased dramatically). > > > > > > Finally, when I try to mount cephfs, I get a mount 5 error. > > > > > > "A mount 5 error typically occurs if a MDS server is laggy or if it > > > crashed. Ensure at least one MDS is up and running, and the cluster > > > is active + healthy". > > > > > > My mds is running, but its log is not terribly active: > > > > > > 2015-03-04 17:47:43.177349 7f42da2c47c0 0 ceph version 0.93 > > > (bebf8e9a830d998eeaab55f86bb256d4360dd3c4), process ceph-mds, pid > > > 4110 2015-03-04 17:47:43.182716 7f42da2c47c0 -1 mds.-1.0 > > > log_to_monitors {default=true} > > > > > > (This is all there is in the log). > > > > > > I think that a key indicator of the problem must be this from the > > > monitor log: > > > > > > 2015-03-04 16:53:20.715132 7f3cd0014700 1 > > > mon.ceph-mon-00@0(leader).mds e1 warning, MDS mds.? > > > [2001:8b0:xxxx:5fb3:xxxx:1fff:xxxx:9054]:6800/4036 up but filesystem > > > disabled > > > > > > (I have added the 'xxxx' sections to obscure my ip address) > > > > > > Q4) Can you give me an idea of what is wrong that causes the mds to > > > not play properly? > > > > > > I think that there are some typos on the manual deployment pages, for > > > example: > > > > > > ceph-osd id={osd-num} > > > > > > This is not right. As far as I am aware it should be: > > > > > > ceph-osd -i {osd-num} > > > > > > An observation. In principle, setting things up manually is not all > > > that complicated, provided that clear and unambiguous instructions > > > are provided. This simple piece of documentation is very important. > > > My view is that the existing manual deployment instructions gets a > > > bit confused and confusing when it gets to the osd setup, and the mds > > > setup is completely absent. > > > > > > For someone who knows, this would be a fairly simple and fairly quick > > > operation to review and revise this part of the documentation. I > > > suspect that this part suffers from being really obvious stuff to the > > > well initiated. For those of us closer to the start, this forms the > > > ends of the threads that have to be picked up before the journey can > > > be made. > > > > > > Very best regards, > > > David > > > _______________________________________________ > > > ceph-users mailing list > > > ceph-users@xxxxxxxxxxxxxx > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Fusion Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com