On Tue, Jul 3, 2018 at 12:24 PM Jesus Cea <jcea@xxxxxxx> wrote: > > On 03/07/18 13:08, John Spray wrote: > > Right: as you've noticed, they're not spurious, they're where we keep > > a "backtrace" xattr for a file. > > > > Backtraces are lazily updated paths, that enable CephFS to map an > > inode number to a file's metadata, which is needed when resolving hard > > links or NFS file handles. The trouble with the backtrace in the > > individual data pools is that the MDS would have to scan through the > > pools to find it, so instead all the files get a backtrace in the root > > pool. > > Given this, the warning "1 pools have many more objects per pg than > average" will happen ALWAYS. Is there any plan to do some kind of > special case for this situation or a "mon_pg_warn_max_object_skew" > override will be needed forever?. The "more objects per pg than average" warning is based on the idea that there is some approximate ratio of objects to PGs that is desirable, but Ceph doesn't know what that ratio is, so Ceph is assuming that you've got roughly the right ratio overall, and any pool 10x denser than that is a problem. To directly address that warning rather than silencing it, you'd increase the number of PGs in your primary data pool. There's a conflict here between pools with lots of data (where the MB per PG might be the main concern, not the object size), vs. metadata-ish pools (where the object counter per PG is the main concern). Maybe it doesn't really make sense to group them all together when calculating the average object-per-pg count that's used in this health warning -- I'll bring that up over on ceph-devel in a moment. John > > >> Should this data be stored in the metadata pool? > > > > Yes, probably. As you say, it's ugly how we end up with these extra > [...] > > One option would be to do both by default: write the backtrace to the > > metadata pool for it's ordinary functional lookup purpose, but also > > write it back to the data pool as an intentionally redundant > > resilience measure. The extra write to the data pool could be > > disabled by anyone who wants to save the IOPS at the cost of some > > resilience. > This would be nice. Another option would be simply use less objects, but > I guess that could be a major change. > > Actually, my main issue here is the warning "1 pools have many more > objects per pg than average". My cluster is permanently in WARNING > state, with the known consequences, and my "mon_pg_warn_max_object_skew" > override is not working, for some reason. I am using Ceph 12.2.5. > > > Thanks for your time and expertise! > > -- > Jesús Cea Avión _/_/ _/_/_/ _/_/_/ > jcea@xxxxxxx - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ > Twitter: @jcea _/_/ _/_/ _/_/_/_/_/ > jabber / xmpp:jcea@xxxxxxxxxx _/_/ _/_/ _/_/ _/_/ _/_/ > "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ > "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ > "El amor es poner tu felicidad en la felicidad de otro" - Leibniz > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com