On Tue, Nov 28, 2017 at 8:46 PM, yuxiang fang <abcdeffyx@xxxxxxxxx> wrote: > Hi Xuehan > > Have a look at DBObjectMap::_lookup_map_header, it does things following: > > step 1. look up object header in memory cache, go step 2 if miss, or else return > step 2. look up object header in leveldb, go step 3 if hit, or else > return, that is code "db->get(HOBJECT_TO_SEQ, map_header_key(oid), > &out);" > step 3. add header to memory cache > > step 1 and step 3 is cpu intensive, and step 2 is I/O intensive. > > I think bottleneck is in step 2, you can watch cpu usage of filestore > transaction apply threads to confirm it. > > So there are two solutions to improve it: > 1. promote cache hit ratio, avoid to inject too much objects that have > omap k/v, maybe you can use xattr. > 2. speed up step 2, separate omap directory off osd directory and move > leveldb to ssd by using filestore_omap_backend_path. I think your diagnosis is correct, but I'd like to see some less invasive methods for improving it. I haven't spent much time looking at the code, but I think it ought to be possible to set up a different consistency mechanism which drops locks around IO while preventing concurrent access to the specific header in question, right? Oh, and I presume the runtime efficiency of finding the cache size is not really an issue, but if it is that's easy to optimize away with a counter... -Greg > > thanks > ivan from eisoo > > > On Wed, Nov 29, 2017 at 11:11 AM, Xuehan Xu <xxhdx1985126@xxxxxxxxx> wrote: >> Thanks, Greg and yuxiang:-) >> >> We used mdtest on 70 nodes issuing "file creations" to cephfs to test >> the maximum file creations that a cephfs instance with only one active >> mds can do. During the test, we found that, even after the test, there >> are still a lot of I/Os on the data pool, and the osd is under heavy >> presure, just as shown in the attachment "apply_latency" and >> "op_queue_ops". This should be caused by storing files' backtraces. >> >> We also used gdbprof to probe the OSD, the result of which is in the >> attachment "gdb.create.rocksdb.xfs.log". The result shows that the >> major time of the execution of the OSD's filestore threads is spent on >> waiting on the DBObjectMap::header_lock, and the only feasible actual >> execution of filestore threads is adding the object header to >> DBObjectMap::caches. Adding object headers would cause >> DBObjectMap::caches to trim in which the cache's size has to computed >> which is an O(N) operation in GNU STL list::size implementation. On >> the other hand, adding object headers is protected by locking >> "DBObjectMap::header_lock", and our configuration >> "filestore_omap_header_cache_size" is 204800 which is very large and >> would make the cache size computation take considerable time. So we >> think it may be appropriate to move the adding object header operation >> out of the locking field of "DBObjectMap::header_lock", or maybe some >> other mechanism of DBObjectMap::caches should be considered. >> >> Thanks:-) >> >> On 29 November 2017 at 06:59, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote: >>> On Tue, Nov 28, 2017 at 1:51 AM, Xuehan Xu <xxhdx1985126@xxxxxxxxx> wrote: >>>> Hi, everone. >>>> >>>> Recently, we did some stress tests on mds. We found that, when doing >>>> file creation test, the mdlog trim operations are very slow. After >>>> doing some debugging, we found that this could be due to execution of >>>> the OSD's filestore theads being forced to be nearly sequential. This >>>> can be found out in our result of gdbprof probing, which is attached >>>> with this email. In the gdbprof result, we can see that the most time >>>> consuming work of the filestore threads is the sizing of >>>> DBObjectMap::caches, and the reason of sequential execution of >>>> filestore threads is the locking of DBObjectMap::header_lock. >>>> >>>> After reading the corresponding source code, we found that >>>> MapHeaderLock is already doing the mutual exclusion of access of the >>>> omap object header. It seems that the locking of >>>> DBObjectMap::header_lock is not very necessary, or at least, it's not >>>> needed when adding the header to DBObjectMap::caches, which would lead >>>> to the sizing of the cache. >>>> >>>> Is this right? >>> >>> I'm a bit confused; can you explain exactly what you're testing and >>> exactly what you're measuring that leads you to think the mutexes are >>> overly expensive? >>> >>> Note that there's both a header_lock and a cache_lock; in a quick skim >>> they don't seem to be in gratuitous use (unless there are some disk >>> accesses hiding underneath the header_lock?) and the idea of them >>> being a performance bottleneck has not come up before. >>> -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html