On Sun, 14 Jul 2013, Stefan Priebe wrote: > Hello list, > > might this be a problem due to having too much PGs? I've 370 per OSD instead > of having 33 / OSD (OSDs*100/3). That might exacerbate it. Can you try setting osd min pg log entries = 50 osd max pg log entries = 100 across your cluster, restarting your osds, and see if that makes a difference? I'm wondering if this is a problem with pg log rewrites after peering. Note that adding that option and restarting isn't enough to trigger the trim; you have to hit the cluster with some IO too, and (if this is the source of your problem) the trim itself might be expensive. So add it, restart, do a bunch of io (to all pools/pgs if you can), and then see if the problem is still present? Also note that the lower osd min pg log entries means that the osd cannot be down as long without requiring a backfill (50 ios per pg). These probably aren't the values that we want, but I'd like to find out whether the pg log rewrites after peering in cuttlefish are the culprit here. Thanks! > Is there any plan for PG merging? Not right now. :( I'll talk to Sam, though, to see how difficult it would be given the split approach we settled on. Thanks! sage > > Stefan > > Hello list, > > > > anyone else here who always has problems bringing back an offline OSD? > > Since cuttlefish i'm seeing slow requests for the first 2-5 minutes > > after bringing an OSD oinline again but that's so long that the VMs > > crash as they think their disk is offline... > > > > Under bobtail i never had any problems with that. > > > > Please HELP! > > > > Greets, > > Stefan > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com