Also, why bother doing this expensive recheck in the OSD side ? The OSDMap can change after this check and OSD actually carrying out transaction , am I right ? If so, anyway we are not able to protect in all the scenarios. Thanks & Regards Somnath -----Original Message----- From: ceph-devel-owner@xxxxxxxxxxxxxxx [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Somnath Roy Sent: Friday, October 23, 2015 7:02 PM To: Sage Weil Cc: ceph-devel@xxxxxxxxxxxxxxx Subject: RE: Lock contention in do_rule Thanks for the clarification Sage.. I don't have much knowledge on this part , but, what I understood by going through the code is the following.. 1. Client calculates the pg->osd map by executing the same functions. 2. Request coming to OSDs and OSDs are executing the same functions again to check if mapping still is the same or not. If something changed and it is not the right OSD to execute the request it error out. 3. So, the lock is to protect against the bucket perm attribute write from multiple threads.. Some ideas : 1. The lock itself may not be expensive but it is held in the beginning of do_rule. If we take the lock in much more granular level like in bucket_perm_choose() , it could be a gain..If it is a possibility , we can test this out. 2. May be checking it from a messenger thread is having an effect, what if we move the check in the OSD worker thread ? Thanks & Regards Somnath -----Original Message----- From: Sage Weil [mailto:sage@xxxxxxxxxxxx] Sent: Friday, October 23, 2015 6:10 PM To: Somnath Roy Cc: ceph-devel@xxxxxxxxxxxxxxx Subject: Re: Lock contention in do_rule On Sat, 24 Oct 2015, Somnath Roy wrote: > Hi Sage, > We are seeing the following mapper_lock is heavily contended and commenting out this lock is improving performance ~10 % (in the short circuit path). > This is called for every io from osd_is_valid_op_target(). > I looked into the code ,but, couldn't understand the purpose of the lock , it seems redundant to me , could you please confirm ? > > > void do_rule(int rule, int x, vector<int>& out, int maxout, > const vector<__u32>& weight) const { > Mutex::Locker l(mapper_lock); > int rawout[maxout]; > int scratch[maxout * 3]; > int numrep = crush_do_rule(crush, rule, x, rawout, maxout, &weight[0], weight.size(), scratch); > if (numrep < 0) > numrep = 0; > out.resize(numrep); > for (int i=0; i<numrep; i++) > out[i] = rawout[i]; > } It's needed because of this: https://github.com/ceph/ceph/blob/master/src/crush/crush.h#L137 https://github.com/ceph/ceph/blob/master/src/crush/mapper.c#L88 This is clearly not the greatest approach. I think what we need is a cache that is provided by the caller (which would be annoying an awkward because it's not linked directly to the bucket in question, and would not be shared between threads) or crush upcalls that take the lock only when in the perm path (which is relatively rare). I'd lean toward the latter, but we need to be careful about it since this code is shared with the kernel and it needs to work there as well. Probably we just need to define two callbacks for lock and unlock on the struct crush_map? sage ________________________________ PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html