On Sat, Mar 7, 2015 at 12:03 AM, Sage Weil <sweil@xxxxxxxxxx> wrote: > Hi! > > [copying ceph-devel] > > On Fri, 6 Mar 2015, Nicheal wrote: >> Hi Sage, >> >> Cool for issue #3878, Duplicated pg_log write, which is post early in >> my issue #3244 and Single omap_setkeys transaction improve the >> performance in FileStore as in my previous testing (most of time cost >> in FileStore is in the transaction omap_setkeys). > > I can't find #3244? I think it's https://github.com/ceph/ceph/pull/3244 > >> Well, I think another performance issue is to the strategy of setattrs. >> Here is some kernel log achieve from xfs behavious. >> Mar 6 17:19:37 ceph2 kernel: start_xfs_attr_set_int: name = >> ceph._(6), value =.259) >> Mar 6 17:19:37 ceph2 kernel: format of di_c data: 2, format of attr >> forks data: 1 >> Mar 6 17:19:37 ceph2 kernel: di_extsize=0, di_nextents=0, >> di_anextents=0, di_forkoff=239 >> >> Mar 6 17:19:37 ceph2 kernel: start_xfs_attr_set_int: name = >> ceph._(6), value =.259) >> Mar 6 17:19:37 ceph2 kernel: format of di_c data: 2, format of attr >> forks data: 2 >> Mar 6 17:19:37 ceph2 kernel: di_extsize=0, di_nextents=1, >> di_anextents=1, di_forkoff=239 >> >> Mar 6 17:19:37 ceph2 kernel: start_xfs_attr_set_int: name = >> ceph._(6), value =.259) >> Mar 6 17:19:37 ceph2 kernel: format of di_c data: 2, format of attr >> forks data: 2 >> Mar 6 17:19:37 ceph2 kernel: di_extsize=0, di_nextents=0, >> di_anextents=1, di_forkoff=239 >> >> typedef enum xfs_dinode_fmt { >> XFS_DINODE_FMT_DEV, /* xfs_dev_t */ >> XFS_DINODE_FMT_LOCAL, /* bulk data */ >> XFS_DINODE_FMT_EXTENTS, /* struct xfs_bmbt_rec */ >> XFS_DINODE_FMT_BTREE, /* struct xfs_bmdr_block */ >> XFS_DINODE_FMT_UUID /* uuid_t */ >> } xfs_dinode_fmt_t; >> >> while attr forks data = 2 means XFS_DINODE_FMT_EXTENTS (xattr is >> stored in extent format), while attr forks data =1 means >> XFS_DINODE_FMT_LOCAL(xattr is stored as inline attribute). >> >> However, in most cases, xattr attribute is stored in extent, not >> inline. Please note that, I have already formatted the partition with >> -i size=2048. when the number of xattrs is larger than 10, it uses >> XFS_DINODE_FMT_BTREE to accelerate key searching. > > Did you by chance look at what size the typical xattrs are? I expected > that the usual _ and snapset attrs would be small enough to fit inline.. > but if they're not then we should at a minimum adjust our recommendation > on xfs inode size. > >> So, in _setattr(), we may just get xattr_key by using chain_flistxattr >> instead of _fgetattrs, which retrieve (key, value) pair, as value is >> exactly no use here. and furthermore, we may consider the strategies >> that we need move spill_out xattr to omap, while xfs only restricts >> that each xattr value < 64K and each xattr key < 255byte. And >> duplicated read for XATTR_SPILL_OUT_NAME also occurs in: >> r = chain_fgetxattr(**fd, XATTR_SPILL_OUT_NAME, buf, sizeof(buf)); >> r = _fgetattrs(**fd, inline_set); >> And I try to ignore the _fgetattrs() logic and just update xattr >> update in _setattr(), my ssd cluster will be improved about 2% - 3% >> performance. > > I'm not quite following... do you have a patch we can look at? I think his meaning is that we can use minimal xattr attrs and avoid xattr-chains but using omap. > >> Another issue about an idea of recovery is showed in >> https://github.com/ceph/ceph/pull/3837 >> Can you give some suggestion about that? > > I think this direction has a lot of potential, although it will add a fair > bit of complexity. > > I think you can avoid the truncate field and infer that from the dirtied > interval and the new object size. Need to look at the patch more closely > still, though... For xattr and omap optimization I expect this PR mostly https://github.com/ceph/ceph/pull/2972 > > sage > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Best Regards, Wheat -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html