Re: mon_pg_warn_max_object_skew logic (was: [ceph-users] Spurious empty files in CephFS root pool when multiple pools associated)

Gregory Farnum <gfarnum@xxxxxxxxxx> · Mon, 9 Jul 2018 07:44:45 -0700



Was this warning inserted for a particular reason we know of? It was
part of https://github.com/ceph/ceph/pull/625, next to one about
PG/OSD ratio skews, but there's not a ticket or justification other
than "suggests this particular pool may have too few PGs."
-Greg

On Tue, Jul 3, 2018 at 5:12 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
> On Tue, 3 Jul 2018, John Spray wrote:
>> On Tue, Jul 3, 2018 at 12:46 PM John Spray <jspray@xxxxxxxxxx> wrote:
>> >
>> > On Tue, Jul 3, 2018 at 12:24 PM Jesus Cea <jcea@xxxxxxx> wrote:
>> > >
>> > > On 03/07/18 13:08, John Spray wrote:
>> > > > Right: as you've noticed, they're not spurious, they're where we keep
>> > > > a "backtrace" xattr for a file.
>> > > >
>> > > > Backtraces are lazily updated paths, that enable CephFS to map an
>> > > > inode number to a file's metadata, which is needed when resolving hard
>> > > > links or NFS file handles.  The trouble with the backtrace in the
>> > > > individual data pools is that the MDS would have to scan through the
>> > > > pools to find it, so instead all the files get a backtrace in the root
>> > > > pool.
>> > >
>> > > Given this, the warning "1 pools have many more objects per pg than
>> > > average" will happen ALWAYS. Is there any plan to do some kind of
>> > > special case for this situation or a "mon_pg_warn_max_object_skew"
>> > > override will be needed forever?.
>> >
>> > The "more objects per pg than average" warning is based on the idea
>> > that there is some approximate ratio of objects to PGs that is
>> > desirable, but Ceph doesn't know what that ratio is, so Ceph is
>> > assuming that you've got roughly the right ratio overall, and any pool
>> > 10x denser than that is a problem.
>> >
>> > To directly address that warning rather than silencing it, you'd
>> > increase the number of PGs in your primary data pool.
>> >
>> > There's a conflict here between pools with lots of data (where the MB
>> > per PG might be the main concern, not the object size), vs.
>> > metadata-ish pools (where the object counter per PG is the main
>> > concern).  Maybe it doesn't really make sense to group them all
>> > together when calculating the average object-per-pg count that's used
>> > in this health warning -- I'll bring that up over on ceph-devel in a
>> > moment.
>>
>> Migrating this topic from ceph-users for more input -- am I talking sense?
>>
>> It seems wrong that we would look at the average object count per PG
>> of pools containing big objects, and use it to validate the object
>> count per PG of pools containing tiny objects.
>
> Yeah, it's a pretty lame set of criteria for this warning most ways you
> look at it, I think.  This came up about a month ago on another ceph-users
> thread: https://www.spinics.net/lists/ceph-devel/msg41418.html
>
> I wonder if we should either (1) increase the default value here by a big
> factor (5? 10?), or (2) remove this warning entirely since we're about to
> start auto-tuning PG counts anyway.
>
> sage
>
>
>
>>
>> John
>>
>> >
>> > John
>> >
>> > >
>> > > >> Should this data be stored in the metadata pool?
>> > > >
>> > > > Yes, probably.  As you say, it's ugly how we end up with these extra
>> > > [...]
>> > > > One option would be to do both by default: write the backtrace to the
>> > > > metadata pool for it's ordinary functional lookup purpose, but also
>> > > > write it back to the data pool as an intentionally redundant
>> > > > resilience measure.  The extra write to the data pool could be
>> > > > disabled by anyone who wants to save the IOPS at the cost of some
>> > > > resilience.
>> > > This would be nice. Another option would be simply use less objects, but
>> > > I guess that could be a major change.
>> > >
>> > > Actually, my main issue here is the warning "1 pools have many more
>> > > objects per pg than average". My cluster is permanently in WARNING
>> > > state, with the known consequences, and my "mon_pg_warn_max_object_skew"
>> > > override is not working, for some reason. I am using Ceph 12.2.5.
>> > >
>> > >
>> > > Thanks for your time and expertise!
>> > >
>> > > --
>> > > Jesús Cea Avión                         _/_/      _/_/_/        _/_/_/
>> > > jcea@xxxxxxx - http://www.jcea.es/     _/_/    _/_/  _/_/    _/_/  _/_/
>> > > Twitter: @jcea                        _/_/    _/_/          _/_/_/_/_/
>> > > jabber / xmpp:jcea@xxxxxxxxxx  _/_/  _/_/    _/_/          _/_/  _/_/
>> > > "Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
>> > > "My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
>> > > "El amor es poner tu felicidad en la felicidad de otro" - Leibniz
>> > >
>> > > _______________________________________________
>> > > ceph-users mailing list
>> > > ceph-users@xxxxxxxxxxxxxx
>> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html