On Tue, Jul 5, 2011 at 07:34, Mark Nigh <mnigh@xxxxxxxxxxxxxxx> wrote: > I have created a cluster with 2 nodes and 6 osds each. > > I would like to verify if my pgs are being placed on the correct nodes based on my crushmap. I would like to make sure that my replication (x2, default) is not placed on the same host. Osd.0 through 5 is on host0 and osd.6 through osd.11 is on host1. ... > The end of "ceph pg dump -o -" shows the following which doesn't look correct. > > osdstat kbused kbavail kb hb in hb out > 0 0 0 0 [] [] > 1 393884 2927750484 2930265540 [0,2,3,4,5] [0,2,3,4,5] > 2 304580 2927838884 2930265540 [0,1,3,4,5] [0,1,3,4,5] > 3 0 0 0 [] [] > 4 0 0 0 [] [] > 5 158908 2927983588 2930265540 [0,1,2,3,4] [0,1,2,3,4] > 6 496 2892788952 2894914980 [] [] > 7 504 2928139544 2930265540 [] [] > 8 504 2928139544 2930265540 [] [] > 9 504 2928139544 2930265540 [] [] > 10 504 2928139544 2930265540 [] [] > 11 504 2928139544 2930265540 [] [] > sum 860388 26317059628 26337039300 Yes, that looks like a bunch of your osds are not doing much. Some of them seem be failing (6-11, see "hb in" later), some seem to just be getting no objects assigned to them (0, 3, 4) -- I'm not sure if the kbavail==0 is a symptom of an actual problem, or just so because no pg's got assigned to the osd by your crushmap. You can test a crush config by simulating the data placement: ceph osd getcrushmap -o crushmap crushtool -i crushmap --test Here's example output from my two-osd test cluster with perfect balancing: devices weights (hex): [10000,10000] rule 0 (data), x = 0..9999 device 0: 10000 device 1: 10000 num results 2: 10000 rule 1 (metadata), x = 0..9999 device 0: 10000 device 1: 10000 num results 2: 10000 rule 2 (rbd), x = 0..9999 device 0: 10000 device 1: 10000 num results 2: 10000 Note that this doesn't consider any of the operational aspects of the cluster, such as nodes being unreachable, full or overloaded. > Which brings me to a couple of questions. > > 1. what is "hb in" and "hb out"? It's the status of the hearbeat, whether the OSDs see each other (both incoming and outgoing, as it's not necessarily symmetric). In your output, OSDs 0-5 see each other; nobody sees osds 6-11 and they don't report seeing any other OSDs. You said that OSDs 6-11 are on the second machine; it seems that machine is not able to talk to the monitors. > 2. The original crush map examples shows "type device". The new versions of ceph show the first type as osds? I changed my back to device, but how do you define an osd or this automatically done for you. I mean by "define" is the section in the crush map that shows all the devices, device 0 device0. As far as I know, the bucket type strings are pretty much arbitrary, and you use whatever makes sense for your deployment. For example, if you had only one osd per host, then having a both bucket types host and osd would be unnecessary. It's just describing your physical deployment, e.g. row/rack/machine/osd, so the crush rules can control placement e.g. across racks. > 3. In the new default crushmap, there is a domain bucket. What is that intended for? Host? The default crushmap, being automatically generated, has no knowledge of the physical layout. For example, a lot of test clusters are fully on a single host. Hence, it uses a vague word domain as the type of the upper level bucket. If you customize, you know better, and can use a more descriptive word. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html