Hi, Sage Thanks for your suggestion. It works for us. Best Regards On Tue, Jan 10, 2017 at 12:44:50PM +0000, Sage Weil wrote: > On Tue, 10 Jan 2017, caifeng.zhu@xxxxxxxxxxx wrote: > > Hi, all > > > > We find that after the number of pgs increased, the object stat sum > > in pg info is incorrect. > > > > The following steps can reproduce the problem. > > 0 assume the object store is a filestore. > > 1 create a pool 'foo' with the number of pgs such as 64. > > 2 write data through clients(rbd, cephfs or rgw) into the pool 'foo'. > > 3 increase the number of pgs in the pool 'foo' to such as 128. > > 4 after pgs are settled, use 'ceph pg x.y query' to look at the field > > 'num_objects' > > 5 find the osd shard where pg x.y resides by 'ceph pg map x.y' and > > count the number of objects in the osd shard by command like > > 'find /var/lib/ceph/osd/ceph-0/current/x.y_head/ -type f | wc -l' > > > > The code flow to increase the pg number is as follows: > > OSD::advance_pg > > -> OSD::split_pgs > > -> object_stat_sum::split > > -> ReplicatedPG::split_colls > > -> PG::_create > > -> ObjectStore::Transaction::split_collection > > /* indirectly call FileStore::_split_collection > > * when applying transaction into file system. > > */ > > -> PG::split_into > > > > Compare object_stat_sum::split with FileStore::_split_collection, the splitting > > logic is different and makes stat.sum different from the actual number of objects > > in the collection. > > > > The question is that should we fix this difference? If so, how to fix? > > In current design, it seems very difficult to fix the problem. > > Right, it's expected to be out of sync. The pg_stats structure has a bool > flag indicating the stats are not strictly accurate (only an > approximation), and will be corrected during the next scrub. You can > force this to happen explicitly on a test pg with 'ceph pg scrub <pgid>' > and then verif that afterwards the stats are accurate. You can also see > the full stats strcuture (including the flag) with 'ceph pg dump -f > json-pretty'. > > It would be very hard to make the ObjectStore backend (FileStore or > BlueStore) be able to split a collection in O(1) time *and* provide an > accurate split of the stats (and its many fields) as well. And not that > important; the approximation is sufficient for most purposes. The only > one it's not good enough for is the cache tiering agent; that is disabled > until the next scrub happens on the PG. > > sage > > > > > A similar bug is reported as tracker.ceph.com/issues/16671, which will occur > > if all the exitent data in pool 'foo' is deleted. > > > > Best Regards > > > > > > > > -- > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html