On Tue, 3 Jul 2018, John Spray wrote: > On Tue, Jul 3, 2018 at 12:46 PM John Spray <jspray@xxxxxxxxxx> wrote: > > > > On Tue, Jul 3, 2018 at 12:24 PM Jesus Cea <jcea@xxxxxxx> wrote: > > > > > > On 03/07/18 13:08, John Spray wrote: > > > > Right: as you've noticed, they're not spurious, they're where we keep > > > > a "backtrace" xattr for a file. > > > > > > > > Backtraces are lazily updated paths, that enable CephFS to map an > > > > inode number to a file's metadata, which is needed when resolving hard > > > > links or NFS file handles. The trouble with the backtrace in the > > > > individual data pools is that the MDS would have to scan through the > > > > pools to find it, so instead all the files get a backtrace in the root > > > > pool. > > > > > > Given this, the warning "1 pools have many more objects per pg than > > > average" will happen ALWAYS. Is there any plan to do some kind of > > > special case for this situation or a "mon_pg_warn_max_object_skew" > > > override will be needed forever?. > > > > The "more objects per pg than average" warning is based on the idea > > that there is some approximate ratio of objects to PGs that is > > desirable, but Ceph doesn't know what that ratio is, so Ceph is > > assuming that you've got roughly the right ratio overall, and any pool > > 10x denser than that is a problem. > > > > To directly address that warning rather than silencing it, you'd > > increase the number of PGs in your primary data pool. > > > > There's a conflict here between pools with lots of data (where the MB > > per PG might be the main concern, not the object size), vs. > > metadata-ish pools (where the object counter per PG is the main > > concern). Maybe it doesn't really make sense to group them all > > together when calculating the average object-per-pg count that's used > > in this health warning -- I'll bring that up over on ceph-devel in a > > moment. > > Migrating this topic from ceph-users for more input -- am I talking sense? > > It seems wrong that we would look at the average object count per PG > of pools containing big objects, and use it to validate the object > count per PG of pools containing tiny objects. Yeah, it's a pretty lame set of criteria for this warning most ways you look at it, I think. This came up about a month ago on another ceph-users thread: https://www.spinics.net/lists/ceph-devel/msg41418.html I wonder if we should either (1) increase the default value here by a big factor (5? 10?), or (2) remove this warning entirely since we're about to start auto-tuning PG counts anyway. sage > > John > > > > > John > > > > > > > > >> Should this data be stored in the metadata pool? > > > > > > > > Yes, probably. As you say, it's ugly how we end up with these extra > > > [...] > > > > One option would be to do both by default: write the backtrace to the > > > > metadata pool for it's ordinary functional lookup purpose, but also > > > > write it back to the data pool as an intentionally redundant > > > > resilience measure. The extra write to the data pool could be > > > > disabled by anyone who wants to save the IOPS at the cost of some > > > > resilience. > > > This would be nice. Another option would be simply use less objects, but > > > I guess that could be a major change. > > > > > > Actually, my main issue here is the warning "1 pools have many more > > > objects per pg than average". My cluster is permanently in WARNING > > > state, with the known consequences, and my "mon_pg_warn_max_object_skew" > > > override is not working, for some reason. I am using Ceph 12.2.5. > > > > > > > > > Thanks for your time and expertise! > > > > > > -- > > > Jesús Cea Avión _/_/ _/_/_/ _/_/_/ > > > jcea@xxxxxxx - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ > > > Twitter: @jcea _/_/ _/_/ _/_/_/_/_/ > > > jabber / xmpp:jcea@xxxxxxxxxx _/_/ _/_/ _/_/ _/_/ _/_/ > > > "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ > > > "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ > > > "El amor es poner tu felicidad en la felicidad de otro" - Leibniz > > > > > > _______________________________________________ > > > ceph-users mailing list > > > ceph-users@xxxxxxxxxxxxxx > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > >