That's great Haomai, looking forward to this pull request. Thanks & Regards Somnath -----Original Message----- From: Haomai Wang [mailto:haomaiwang@xxxxxxxxx] Sent: Thursday, October 02, 2014 10:28 PM To: Somnath Roy Cc: ceph-devel Subject: Re: FW: Weekly Ceph Performance Meeting Invitation On Fri, Oct 3, 2014 at 1:39 AM, Somnath Roy <Somnath.Roy@xxxxxxxxxxx> wrote: > Please share your opinion on this.. > > -----Original Message----- > From: Sage Weil [mailto:sweil@xxxxxxxxxx] > Sent: Wednesday, October 01, 2014 3:57 PM > To: Somnath Roy > Cc: Mark Nelson; Kasper Dieter; Andreas Bluemle; Paul Von-Stamwitz > Subject: RE: Weekly Ceph Performance Meeting Invitation > > On Wed, 1 Oct 2014, Somnath Roy wrote: >> Yes Sage, it's all read..Each call to lfn_open() will incur this >> lookup in case of FDCache miss (which will be in 99% of cases). >> The following patch will certainly help the write path (which is >> exciting!) but not read as read is not through the transaction path. >> My understanding is in the read path per io only two calls are going >> to filestore , one xattr ("_") and followed by read to the same >> object. If somehow, we can club (or something) this two requests, >> reads will be benefitted. I did some prototype earlier by passing the >> fd (and path) to the replicated pg during getattr call and pass the >> same fd/path during next read. This improving performance as well as >> cpu usage. But, this is against the objectstore interface logic :-( >> Basically, sole purpose of FDCache for serving this kind of scenario >> but since it is sharded based on object hash now (and FDCache itself >> is cpu >> intensive) it is not helping much. May be sharding based on PG >> (Col_id) could help here ? > > I suspect a more fruitful approach would be to make a read-side handle-based API for objectstore... so you can 'open' an object, keep that handle to the ObjectContext, and then do subsequent read operations against that. > > Sharding the FDCache per PG would help with lock contention, yes, but is that the limiter or are we burning CPU? > >> Also, I don't think ceph io path is very memory intensive and we can >> leverage some memory for cache usage. For example, if we can have a >> object_context cache at Replicated PG level (now the cache is there >> but the contexts are not persisted), the performance (and cpu >> usage)will be improved dramatically. I know that there can be lot of >> PGs and thus memory usage can be a challenge. But, certainly we can >> control that by limiting per cache size and what not. What could be >> the size of an object_context instance, shouldn't be much I guess. I >> did some prototyping on that too and got significant improvement. >> This will eliminate the getattr path in case of cache hit. > > Can you propose this on ceph-devel? I think this is promising. And probably quite easy to implement. Yes, we have done this impl and it can help reduce nearly 100us for each IO if hit cache. We will make a pull request next week. :-) > >> Another challenge for read ( and for write too probably) is the >> sequential io in case of rbd . With the Linux default read_ahead , >> performance of sequential read is significantly less than random read >> with latest code in case of io_size say 64K. The obvious reason is >> that with rbd, the default object size being 4MB, lot of sequential >> 64K reads are coming to same PG and getting bottlenecked there. >> Increasing read_ahead size improving performance but that will have >> an effect in random workload. I think PG level cache should help here. >> Striped images from librbd will not be facing this problem I guess >> but krbd is not supporting striping and it is definitely a problem there. > > I still think the key here is a comprehensive set of IO hints. Then it's a problem of making sure we are using them effectively... > >> We can discuss these in next meeting if this sounds interesting. > > Yeah, but let's discuss on list first, no reason to wait! > > s > >> >> Thanks & Regards >> Somnath >> >> -----Original Message----- >> From: Sage Weil [mailto:sweil@xxxxxxxxxx] >> Sent: Wednesday, October 01, 2014 1:14 PM >> To: Somnath Roy >> Cc: Mark Nelson; Kasper Dieter; Andreas Bluemle; Paul Von-Stamwitz >> Subject: RE: Weekly Ceph Performance Meeting Invitation >> >> On Wed, 1 Oct 2014, Somnath Roy wrote: >> > CPU wise the following are still hurting us in Giant. Lot of fixes >> > like IndexManager stuff went in Giant that helped cpu consumption >> > wise as well. >> > >> > 1. LFNIndex lookup logic . I have a fix that will save around one >> > cpu core on that path. I am yet to address comments made by >> > Greg/Sam on that. But, lot of improvement can happen here. >> >> Have you looked at >> >> >> https://github.com/ceph/ceph/commit/74b1cf8bf1a7a160e6ce14603df63a46b >> 2 >> 2d8b98 >> >> The patch is incomplete, but with that change we should be able to drop to a single path lookup per ObjectStore::Transaction (as opposed to one for each op in the transaction that touches the given object). I'm not sure if you were looking at ops that had a lot of those or they were simple single-io type operations? That would only help on the write path; I think you said you've been focusing in reads. >> >> > 2. Buffer class is very cpu intensive. Fixing that part will be >> > helping every ceph components. >> >> +1 >> >> sage >> >> ________________________________ >> >> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). >> >> > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" > in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo > info at http://vger.kernel.org/majordomo-info.html -- Best Regards, Wheat ��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f