Hello Christian,
Thanks again for all of your help! I started a bonnie test using the
following::
bonnie -d /mnt/rbd/scratch2/ -m $(hostname) -f -b
Hopefully it completes in the next hour or so. A reboot of the slow OSDs
clears the slow marker for now
kh10-9$ ceph -w
cluster 9ea4d9d9-04e4-42fe-835a-34e4259cf8ec
health HEALTH_OK
monmap e1: 3 mons at
{kh08-8=10.64.64.108:6789/0,kh09-8=10.64.64.117:6789/0,kh10-8=10.64.64.125:6789/0},
election epoch 338, quorum 0,1,2 kh08-8,kh09-8,kh10-8
osdmap e15356: 1256 osds: 1256 up, 1256 in
pgmap v788798: 87560 pgs, 18 pools, 187 TB data, 47919 kobjects
566 TB used, 4001 TB / 4567 TB avail
87560 active+clean
client io 542 MB/s rd, 1548 MB/s wr, 7552 op/s
2014-12-19 01:27:28.547884 mon.0 [INF] pgmap v788797: 87560 pgs: 87560
active+clean; 187 TB data, 566 TB used, 4001 TB / 4567 TB avail; 433
MB/s rd, 1090 MB/s wr, 5774 op/s
2014-12-19 01:27:29.581955 mon.0 [INF] pgmap v788798: 87560 pgs: 87560
active+clean; 187 TB data, 566 TB used, 4001 TB / 4567 TB avail; 542
MB/s rd, 1548 MB/s wr, 7552 op/s
2014-12-19 01:27:30.638744 mon.0 [INF] pgmap v788799: 87560 pgs: 87560
active+clean; 187 TB data, 566 TB used, 4001 TB / 4567 TB avail; 726
MB/s rd, 2284 MB/s wr, 10451 op/s
Once the next slow osd comes up I guess I can tell it to bump it's log
up to 5 and see what may be going on.
That said I didn't see much last time.
On 12/19/2014 12:17 AM, Christian Balzer wrote:
Hello,
On Thu, 18 Dec 2014 23:45:57 -0600 Sean Sullivan wrote:
Wow Christian,
Sorry I missed these in line replies. Give me a minute to gather some
data. Thanks a million for the in depth responses!
No worries.
I thought about raiding it but I needed the space unfortunately. I had a
3x60 osd node test cluster that we tried before this and it didn't have
this flopping issue or rgw issue I am seeing .
I think I remember that...
I hope not. I don't think I posted about it at all. I only had it for a
short period before it was re purposed. I did post about a cluster
before that with 32 osds per node though. That one had tons of issues
but now seems to be running relatively smoothly.
You do realize that the RAID6 configuration option I mentioned would
actually give you MORE space (replication of 2 is sufficient with reliable
OSDs) than what you have now?
Albeit probably at reduced performance, how much would also depend on the
controllers used, but at worst the RAID6 OSD performance would be
equivalent to that of single disk.
So a Cluster (performance wise) with 21 nodes and 8 disks each.
Ah I must have misread, I thought you said raid 10 which would half the
storage and a small write penalty. For a raid 6 of 4 drives I would get
something like 160 iops (assuming each drive is 75) which may be worth
it. I would just hate to have 2+ failures and lose 4-5 drives as opposed
to 2 and the rebuild for a raid 6 always left a sour taste in my mouth.
Still 4 slow drives is better than 4TB of data over the network slowing
down the whole cluster.
I knew about the 40 cores being low but I thought at 2.7 we may be fine
as the docs recommend 1 X 1G xeons per osd. The cluster hovers around
15-18 CPU but with the constant flipping disks I am seeing it bump up as
high as 120 when a disk is marked as out of the cluster.
kh10-3$ cat /proc/loadavg
14.35 29.50 66.06 14/109434 724476
No need, now that strange monitor configuration makes sense, you (or
whoever spec'ed this) went for the Supermicro Ceph solution, right?
indeed.
In my not so humble opinion, this the worst storage chassis ever designed
by a long shot and totally unsuitable for Ceph.
I told the Supermicro GM for Japan as much. ^o^
Well it looks like I done goofed. I thought it was odd that they went
against most of what ceph documentation says about recommended hardware.
I read/heard from them that they worked with intank on this though so I
was swayed. Besides that we really needed the density per rack due to
limited floor space. As I said in capable hands this cluster would work
but by stroke of luck..
Every time a HDD dies, you will have to go and shut down the other OSD
that resides on the same tray (and set the cluster to noout).
Even worse of course if a SSD should fail.
And if somebody should just go and hotswap things w/o that step first,
hello data movement storm (2 or 10 OSDs instead of 1 or 5 respectively).
Christian
Thanks for your help and insight on this! I am going to take a nap and
hope the cluster doesn't set fire before I wake up o_o
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com