On Thu, Nov 30, 2017 at 1:27 PM, Sage Weil <sweil@xxxxxxxxxx> wrote: > On Thu, 30 Nov 2017, Gregory Farnum wrote: >> On Wed, Nov 29, 2017 at 7:06 PM Sage Weil <sweil@xxxxxxxxxx> wrote: >> > >> > On Thu, 30 Nov 2017, xie.xingguo@xxxxxxxxxx wrote: >> > > (My network connection seems to be problematic, resending :( ) >> > > >> > > Anyway, I am + 1 for doing this in a more effective way (e.g., as Igor >> > > suggested). >> > > >> > > The potential big challenge might be making the scrub-process happy, >> > > though! >> > >> > Would this be something like: >> > >> > 1- an object_info_t field like uint32_t allocated_size, which has been >> > incorporated into the pg summation, and >> > >> > 2- an ObjectStore method that returns the allocated size for an object? >> > >> > The challenge I see is that the new value (or delta) needs to be sorted >> > out at the transaction prepare time because the stat update is part of the >> > transaction, but we won't really know what the result is until bluestore >> > (or any other impl) does it's write preparation work. :/ >> >> >> It would take some doing but this might be a good time to start adding >> delayed work. We could get the stat updates as part of the objectstore >> callback and incorporate them into future disk ops, and part of the >> startup/replay process could be querying for stat updates to objects >> we haven’t committed yet. >> >> ...except we don’t really have an OSD-level or pg replay phase any >> more, do we. Hrmm. And doing it in the transaction would require some >> sort of set-up/query phase to the transaction, then finalization and >> submission, which isn’t great since it impacts checksumming and other >> stuff (although *hopefully* not actual allocation). > > Hmm, and there is a larger problem here: we can't really make this > ObjectStore implementation specific because it may vary across OSDs (some > may be BlueStore, some may be FileStore). > > Even if we didn't have that issue, it constrains the ordering somewhat: > you would need to prepare and submit the local transaction (to get > the stat delta) before sending the replica writes. Yeah. This isn't a problem if we do the stat maintenance separately, but it's a much larger-scoped patch than just poking the interfaces. Would you be opposed to a simple OSD-level replay step that could do stuff like update pg stats for recent object writes? > I think this sort of layer boundary crossing would make our lives very > difficult down the line. :/ I mean, that's true, but it's also something eminently reasonable for admins to want. I've always found it a bit embarrassing we can't expose which snapshots are actually taking up space. Saying which things are using up storage is sort of a critical feature. :/ -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html