Re: 答复: osd: fine-grained statistics for object space usage

Gregory Farnum <gfarnum@xxxxxxxxxx> · Thu, 30 Nov 2017 13:46:17 -0800

On Thu, Nov 30, 2017 at 1:27 PM, Sage Weil <sweil@xxxxxxxxxx> wrote:
> On Thu, 30 Nov 2017, Gregory Farnum wrote:
>> On Wed, Nov 29, 2017 at 7:06 PM Sage Weil <sweil@xxxxxxxxxx> wrote:
>> >
>> > On Thu, 30 Nov 2017, xie.xingguo@xxxxxxxxxx wrote:
>> > > （My network connection seems to be problematic, resending :( ）
>> > >
>> > >   Anyway, I am + 1 for doing this in a more effective way (e.g., as Igor
>> > > suggested).
>> > >
>> > >   The potential big challenge might be making the scrub-process happy,
>> > > though!
>> >
>> > Would this be something like:
>> >
>> > 1- an object_info_t field like uint32_t allocated_size, which has been
>> > incorporated into the pg summation, and
>> >
>> > 2- an ObjectStore method that returns the allocated size for an object?
>> >
>> > The challenge I see is that the new value (or delta) needs to be sorted
>> > out at the transaction prepare time because the stat update is part of the
>> > transaction, but we won't really know what the result is until bluestore
>> > (or any other impl) does it's write preparation work.  :/
>>
>>
>> It would take some doing but this might be a good time to start adding
>> delayed work. We could get the stat updates as part of the objectstore
>> callback and incorporate them into future disk ops, and part of the
>> startup/replay process could be querying for stat updates to objects
>> we haven’t committed yet.
>>
>> ...except we don’t really have an OSD-level or pg replay phase any
>> more, do we. Hrmm. And doing it in the transaction would require some
>> sort of set-up/query phase to the transaction, then finalization and
>> submission, which isn’t great since it impacts checksumming and other
>> stuff (although *hopefully* not actual allocation).
>
> Hmm, and there is a larger problem here: we can't really make this
> ObjectStore implementation specific because it may vary across OSDs (some
> may be BlueStore, some may be FileStore).
>
> Even if we didn't have that issue, it constrains the ordering somewhat:
> you would need to prepare and submit the local transaction (to get
> the stat delta) before sending the replica writes.

Yeah. This isn't a problem if we do the stat maintenance separately,
but it's a much larger-scoped patch than just poking the interfaces.

Would you be opposed to a simple OSD-level replay step that could do
stuff like update pg stats for recent object writes?

> I think this sort of layer boundary crossing would make our lives very
> difficult down the line.  :/

I mean, that's true, but it's also something eminently reasonable for
admins to want. I've always found it a bit embarrassing we can't
expose which snapshots are actually taking up space. Saying which
things are using up storage is sort of a critical feature. :/
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html