Re: pg_stat_t is 500+ bytes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Aug 24, 2016 at 11:12:24AM -0500, Mark Nelson wrote:
> 
> 
> On 08/24/2016 11:09 AM, Sage Weil wrote:
> >On Wed, 24 Aug 2016, Haomai Wang wrote:
> >>On Wed, Aug 24, 2016 at 11:01 AM, Haomai Wang <haomai@xxxxxxxx> wrote:
> >>>On Wed, Aug 24, 2016 at 2:13 AM, Sage Weil <sweil@xxxxxxxxxx> wrote:
> >>>>This is huge.  It takes the pg_info_t str from 306 bytes to 847 bytes, and
> >>>>this _info omap key is rewritten on *every* IO.
> >>>>
> >>>>We could shrink this down significant with varint and/or delta encoding
> >>>>since a huge portion of it is just a bunch of uint64_t counters.  This
> >>>>will probably cost some CPU time, but OTOH it'll also shrink our metadata
> >>>>down a fair bit too which will pay off later.
> >>>>
> >>>>Anybody want to tackle this?
> >>>
> >>>what about separating "object_stat_collection_t stats" from pg_stat_t?
> >>>pg info should be unchanged for most of times, we could only update
> >>>object related stats. This may help to reduce half bytes.
> >
> >I don't think this will work, since every op changes last_update in
> >pg_info_t *and* the stats (write op count, total bytes, objects, etc.).
> >
> >>Or we only store increment values and keep the full in memory(may
> >>reduce to 20-30bytes). In period time we store the full structure(only
> >>hundreds of bytes)....
> >
> >A delta is probably very compressible (only a few fields in the stats
> >struct change).  The question is how fast can we make it in CPU time.
> >Probably a simple delta (which will be almost all 0's) and a trivial
> >run-length-encoding scheme that just gets rid of the 0's would do well
> >enough...
> 
> Do we have any rough idea of how many/often consecutive 0s we end up
> with in the current encoding?

Or how high these counters get? We could try transposing the matrix made of
those counters. At least the two most significant bytes in most of those
counters are mostly zeros, and after transposing, simple RLE would be
feasible. In any case, I'm not sure if *all* of these fields need to be
uint64_t.

-- 
Piotr Dałek
branch@xxxxxxxxxxxxxxxx
http://blog.predictor.org.pl
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux