Was this warning inserted for a particular reason we know of? It was part of https://github.com/ceph/ceph/pull/625, next to one about PG/OSD ratio skews, but there's not a ticket or justification other than "suggests this particular pool may have too few PGs." -Greg On Tue, Jul 3, 2018 at 5:12 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote: > On Tue, 3 Jul 2018, John Spray wrote: >> On Tue, Jul 3, 2018 at 12:46 PM John Spray <jspray@xxxxxxxxxx> wrote: >> > >> > On Tue, Jul 3, 2018 at 12:24 PM Jesus Cea <jcea@xxxxxxx> wrote: >> > > >> > > On 03/07/18 13:08, John Spray wrote: >> > > > Right: as you've noticed, they're not spurious, they're where we keep >> > > > a "backtrace" xattr for a file. >> > > > >> > > > Backtraces are lazily updated paths, that enable CephFS to map an >> > > > inode number to a file's metadata, which is needed when resolving hard >> > > > links or NFS file handles. The trouble with the backtrace in the >> > > > individual data pools is that the MDS would have to scan through the >> > > > pools to find it, so instead all the files get a backtrace in the root >> > > > pool. >> > > >> > > Given this, the warning "1 pools have many more objects per pg than >> > > average" will happen ALWAYS. Is there any plan to do some kind of >> > > special case for this situation or a "mon_pg_warn_max_object_skew" >> > > override will be needed forever?. >> > >> > The "more objects per pg than average" warning is based on the idea >> > that there is some approximate ratio of objects to PGs that is >> > desirable, but Ceph doesn't know what that ratio is, so Ceph is >> > assuming that you've got roughly the right ratio overall, and any pool >> > 10x denser than that is a problem. >> > >> > To directly address that warning rather than silencing it, you'd >> > increase the number of PGs in your primary data pool. >> > >> > There's a conflict here between pools with lots of data (where the MB >> > per PG might be the main concern, not the object size), vs. >> > metadata-ish pools (where the object counter per PG is the main >> > concern). Maybe it doesn't really make sense to group them all >> > together when calculating the average object-per-pg count that's used >> > in this health warning -- I'll bring that up over on ceph-devel in a >> > moment. >> >> Migrating this topic from ceph-users for more input -- am I talking sense? >> >> It seems wrong that we would look at the average object count per PG >> of pools containing big objects, and use it to validate the object >> count per PG of pools containing tiny objects. > > Yeah, it's a pretty lame set of criteria for this warning most ways you > look at it, I think. This came up about a month ago on another ceph-users > thread: https://www.spinics.net/lists/ceph-devel/msg41418.html > > I wonder if we should either (1) increase the default value here by a big > factor (5? 10?), or (2) remove this warning entirely since we're about to > start auto-tuning PG counts anyway. > > sage > > > >> >> John >> >> > >> > John >> > >> > > >> > > >> Should this data be stored in the metadata pool? >> > > > >> > > > Yes, probably. As you say, it's ugly how we end up with these extra >> > > [...] >> > > > One option would be to do both by default: write the backtrace to the >> > > > metadata pool for it's ordinary functional lookup purpose, but also >> > > > write it back to the data pool as an intentionally redundant >> > > > resilience measure. The extra write to the data pool could be >> > > > disabled by anyone who wants to save the IOPS at the cost of some >> > > > resilience. >> > > This would be nice. Another option would be simply use less objects, but >> > > I guess that could be a major change. >> > > >> > > Actually, my main issue here is the warning "1 pools have many more >> > > objects per pg than average". My cluster is permanently in WARNING >> > > state, with the known consequences, and my "mon_pg_warn_max_object_skew" >> > > override is not working, for some reason. I am using Ceph 12.2.5. >> > > >> > > >> > > Thanks for your time and expertise! >> > > >> > > -- >> > > Jesús Cea Avión _/_/ _/_/_/ _/_/_/ >> > > jcea@xxxxxxx - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ >> > > Twitter: @jcea _/_/ _/_/ _/_/_/_/_/ >> > > jabber / xmpp:jcea@xxxxxxxxxx _/_/ _/_/ _/_/ _/_/ _/_/ >> > > "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ >> > > "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ >> > > "El amor es poner tu felicidad en la felicidad de otro" - Leibniz >> > > >> > > _______________________________________________ >> > > ceph-users mailing list >> > > ceph-users@xxxxxxxxxxxxxx >> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html