On Mon, 22 Jul 2013, Gaylord Holder wrote: > If I understand what the #tunables page is saying, changing the tunables kicks > the OSD re-balancing mechanism a bit and resets it to try again. > > I'll see about getting 3.9 kernel in for my RBD maachines, and reset > everything to optimal. Keep in mind this is only needed if you are using the kernel rbd client (rbd map ...), not librbd + qemu or similar. sage > > Thanks again. > > -Gaylord > > On 07/22/2013 04:51 PM, Sage Weil wrote: > > On Mon, 22 Jul 2013, Gaylord Holder wrote: > > > Sage, > > > > > > The crush tunables did the trick. > > > > > > why? Could you explain what was causing the problem? > > > > This has a good explanation, I think: > > > > http://ceph.com/docs/master/rados/operations/crush-map/#tunables > > > > > I've haven't installed 3.9 on my RBD servers yet. Will setting crush > > > tunables > > > back to default or legacy cause me similar problems in the future? > > > > Yeah. For 3.6+ kernels, you can set slightly different tunables and it > > will be very close to optimal... > > > > sage > > > > > > > > > > Thank you again Sage! > > > > > > -Gaylord > > > > > > On 07/22/2013 02:27 PM, Sage Weil wr: > > > > On Mon, 22 Jul 2013, Gaylord Holder wrote: > > > > > > > > > > I have a 12 OSD/3 host set up, and have be stuck with a bunch of stuck > > > > > pages. > > > > > > > > > > I've verified the OSDs are all up and in. The crushmap looks fine. > > > > > I've tried restarting all the daemons. > > > > > > > > > > > > > > > > > > > > root@never:/var/lib/ceph/mon# ceph status > > > > > health HEALTH_WARN 139 pgs degraded; 461 pgs stuck unclean; > > > > > recovery > > > > > 216/6213 degraded (3.477%) > > > > > monmap e4: 2 mons at > > > > > {a=192.168.225.9:6789/0,b=192.168.225.10:6789/0}, > > > > > election epoch 14, quorum 0,1 a,b > > > > > > > > Add another monitor; right now if 1 fails the cluster is unavailable. > > > > > > > > > osdmap e238: 12 osds: 12 up, 12 in > > > > > pgmap v7396: 2528 pgs: 2067 active+clean, 322 active+remapped, > > > > > 139 > > > > > active+degraded; 8218 MB data, 103 GB used, 22241 GB / 22345 GB avail; > > > > > 216/6213 degraded (3.477%) > > > > > mdsmap e1: 0/0/1 up > > > > > > > > My guess crush tunables. Try > > > > > > > > ceph osd crush tunables optimal > > > > > > > > unless you are using a pre-3.8(ish) kernel or other very old > > > > (pre-bobtail) > > > > clients. > > > > > > > > sage > > > > > > > > > > > > > > > > > > > > > > > I have one non-default pool with 3x replication. Fewer than half of > > > > > the > > > > > pg > > > > > have expanded to 3x (278/400 pgs still have acting 2x sets). > > > > > > > > > > Where can I go look for the trouble? > > > > > > > > > > Thank you for any light someone can shed on this. > > > > > > > > > > Cheers, > > > > > -Gaylord > > > > > _______________________________________________ > > > > > ceph-users mailing list > > > > > ceph-users@xxxxxxxxxxxxxx > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > > > > > > > > > > > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com