I should have probably condensed my finding over the course of the day into one post but, I guess that just not how i'm built.....
Another data point. I ran the `ceph daemon mds.cephmds02 perf dump` in a while loop w/ 1 second sleep and grepping out the stats John mentioned and at times(~every 10-15 seconds), I have some large objector.op_active values. After the high values hit, there are 5-10 seconds of zero values.
"handle_client_request": 5785438,
"op_active": 2375,
"handle_client_request": 5785438,
"op_active": 2444,
"handle_client_request": 5785438,
"op_active": 2239,
"handle_client_request": 5785438,
"op_active": 1648,
"handle_client_request": 5785438,
"op_active": 1121,
"handle_client_request": 5785438,
"op_active": 709,
"handle_client_request": 5785438,
"op_active": 235,
"handle_client_request": 5785572,
"op_active": 0,
...............
Should I be concerned about these "op_active" values? I see that in my narrow slice of output, "handle_client_request" does not increment. What is happening there?
thanks,
Bob
On Wed, Aug 5, 2015 at 11:43 PM, Bob Ababurko <bob@xxxxxxxxxxxx> wrote:
I found a way to get the stats you mentioned: mds_server.handle_client_request & objecter.op_active. I can see these values when I run:ceph daemon mds.<id> perf dumpI recently restarted the mds server so my stats reset but I still have something to share:"mds_server.handle_client_request": 4406055"objecter.op_active": 0Should I assume that op_active might be operations in writes or reads that are queued? I haven't been able to find anything describing what these stats actually mean so if anyone knows where to find them, please advise.On Wed, Aug 5, 2015 at 4:59 PM, Bob Ababurko <bob@xxxxxxxxxxxx> wrote:I have installed diamond(built by ksingh found at https://github.com/ksingh7/ceph-calamari-packages) on the MDS node and I am not seeing the mds_server.handle_client_request OR objecter.op_active metrics being sent to graphite. Mind you, this is not the graphite that is part of the calamari install but our own internal graphite cluster. Perhaps that is the reason? I could not get calamari working correctly on hammerhead/centos7.1 so I put it on pause for now to concentrate on the cluster itself.Ultimately, I need to find a way to get a hold of these metrics to determine the health of my MDS so I can justify moving forward on a SSD based cephfs metadata pool.On Wed, Aug 5, 2015 at 4:05 PM, Bob Ababurko <bob@xxxxxxxxxxxx> wrote:Hi John,You are correct in that my expectations may be incongruent with what is possible with ceph(fs). I'm currently copying many small files(images) from a netapp to the cluster...~35k sized files to be exact and the number of objects/files copied thus far is fairly significant(below in bold):[bababurko@cephmon01 ceph]$ sudo rados dfpool name KB objects clones degraded unfound rd rd KB wr wr KBcephfs_data 3289284749 163993660 0 0 0 0 0 328097038 3369847354cephfs_metadata 133364 524363 0 0 0 3600023 5264453980 95600004 1361554516rbd 0 0 0 0 0 0 0 0 0total used 9297615196 164518023total avail 19990923044total space 29288538240Yes, that looks like ~164 million objects copied to the cluster. I would assume this will potentially be a burden to the MDS but I have yet to confirm with the ceph daemontool mds.<id>. I cannot seem to run it on the mds host as it doesn't seem to know about that command:[bababurko@cephmds01]$ sudo ceph daemonperf mds.cephmds01no valid command found; 10 closest matches:osd lost <int[0-]> {--yes-i-really-mean-it}osd create {<uuid>}osd primary-temp <pgid> <id>osd primary-affinity <osdname (id|osd.id)> <float[0.0-1.0]>osd reweight <int[0-]> <float[0.0-1.0]>osd pg-temp <pgid> {<id> [<id>...]}osd in <ids> [<ids>...]osd rm <ids> [<ids>...]osd down <ids> [<ids>...]osd out <ids> [<ids>...]Error EINVAL: invalid commandThis fails in a similar manner on all the hosts in the cluster. I'm very green w/ ceph and i'm probably missing something obvious. Is there something I need to install to get access to the 'ceph daemonperf' command in hammerhead?thanks,BobOn Wed, Aug 5, 2015 at 2:43 AM, John Spray <jspray@xxxxxxxxxx> wrote:On Tue, Aug 4, 2015 at 10:36 PM, Bob Ababurko <bob@xxxxxxxxxxxx> wrote:
> My writes are not going as I would expect wrt to IOPS(50-1000 IOPs) & write
> throughput( ~25MB/s max). I'm interested in understanding what it takes to
> create a SSD pool that I can then migrate the current Cephfs_metadata pool
> to. I suspect that the spinning disk metadata pool is a bottleneck and I
> want to try to get the max performance out of this cluster to prove that we
> would build out a larger version. One caveat is that I have copied about 4
> TB of data to the cluster via cephfs and dont want to lose the data so I
> obviously need to keep the metadata intact.
I'm a bit suspicious of this: your IOPS expectations sort of imply
doing big files, but you're then suggesting that metadata is the
bottleneck (i.e. small file workload).
There are lots of statistics that come out of the MDS, you may be
particular interested in mds_server.handle_client_request,
objecter.op_active, to work out if there really are lots of RADOS
operations getting backed up on the MDS (which would be the symptom of
a too-slow metadata pool). "ceph daemonperf mds.<id>" may be some
help if you don't already have graphite or similar set up.
> If anyone has done this OR understands how this can be done, I would
> appreciate the advice.
You could potentially do this in a two-phase process where you
initially set a crush rule that includes both SSDs and spinners, and
then finally set a crush rule that just points to SSDs. Obviously
that'll do lots of data movement, but your metadata is probably a fair
bit smaller than your data so that might be acceptable.
John
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com