> -----Original Message----- > From: Leon Romanovsky [mailto:leon@xxxxxxxxxx] > Sent: 20 January 2022 04:41 PM > To: Praveen Kannoju <praveen.kannoju@xxxxxxxxxx> > Cc: Jason Gunthorpe <jgg@xxxxxxxx>; Santosh Shilimkar > <santosh.shilimkar@xxxxxxxxxx>; David S . Miller <davem@xxxxxxxxxxxxx>; > kuba@xxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; linux-rdma@xxxxxxxxxxxxxxx; > rds-devel@xxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Rama > Nichanamatlu <rama.nichanamatlu@xxxxxxxxxx>; Rajesh > Sivaramasubramaniom <rajesh.sivaramasubramaniom@xxxxxxxxxx> > Subject: Re: [PATCH RFC] rds: ib: Reduce the contention caused by the > asynchronous workers to flush the mr pool > > On Thu, Jan 20, 2022 at 08:00:54AM +0000, Praveen Kannoju wrote: > > -----Original Message----- > > From: Leon Romanovsky [mailto:leon@xxxxxxxxxx] > > Sent: 19 January 2022 08:26 PM > > To: Praveen Kannoju <praveen.kannoju@xxxxxxxxxx> > > Cc: Santosh Shilimkar <santosh.shilimkar@xxxxxxxxxx>; Jason Gunthorpe > > <jgg@xxxxxxxx>; David S . Miller <davem@xxxxxxxxxxxxx>; > > kuba@xxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; linux-rdma@xxxxxxxxxxxxxxx; > > rds-devel@xxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Rama > > Nichanamatlu <rama.nichanamatlu@xxxxxxxxxx>; Rajesh > > Sivaramasubramaniom <rajesh.sivaramasubramaniom@xxxxxxxxxx> > > Subject: Re: [PATCH RFC] rds: ib: Reduce the contention caused by the > > asynchronous workers to flush the mr pool > > > > On Wed, Jan 19, 2022 at 11:46:16AM +0000, Praveen Kannoju wrote: > > > -----Original Message----- > > > From: Leon Romanovsky [mailto:leon@xxxxxxxxxx] > > > Sent: 19 January 2022 12:29 PM > > > To: Santosh Shilimkar <santosh.shilimkar@xxxxxxxxxx> > > > Cc: Jason Gunthorpe <jgg@xxxxxxxx>; Praveen Kannoju > > > <praveen.kannoju@xxxxxxxxxx>; David S . Miller > > > <davem@xxxxxxxxxxxxx>; kuba@xxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; > > > linux-rdma@xxxxxxxxxxxxxxx; rds-devel@xxxxxxxxxxxxxx; > > > linux-kernel@xxxxxxxxxxxxxxx; Rama Nichanamatlu > > > <rama.nichanamatlu@xxxxxxxxxx>; Rajesh Sivaramasubramaniom > > > <rajesh.sivaramasubramaniom@xxxxxxxxxx> > > > Subject: Re: [PATCH RFC] rds: ib: Reduce the contention caused by > > > the asynchronous workers to flush the mr pool > > > > > > On Tue, Jan 18, 2022 at 07:42:54PM +0000, Santosh Shilimkar wrote: > > > > On Jan 18, 2022, at 11:17 AM, Jason Gunthorpe <jgg@xxxxxxxx> wrote: > > > > > > > > > > On Tue, Jan 18, 2022 at 04:48:43PM +0000, Santosh Shilimkar wrote: > > > > >> > > > > >>> On Jan 18, 2022, at 6:47 AM, Praveen Kannoju > <praveen.kannoju@xxxxxxxxxx> wrote: > > > > >>> > > > > >>> This patch aims to reduce the number of asynchronous workers > > > > >>> being spawned to execute the function "rds_ib_flush_mr_pool" > > > > >>> during the high I/O situations. Synchronous call path's to this > function "rds_ib_flush_mr_pool" > > > > >>> will be executed without being disturbed. By reducing the > > > > >>> number of processes contending to flush the mr pool, the total > > > > >>> number of D state processes waiting to acquire the mutex lock > > > > >>> will be greatly reduced, which otherwise were causing DB > > > > >>> instance crash as the corresponding processes were not > progressing while waiting to acquire the mutex lock. > > > > >>> > > > > >>> Signed-off-by: Praveen Kumar Kannoju > > > > >>> <praveen.kannoju@xxxxxxxxxx> — > > > > >>> > > > > >> […] > > > > >> > > > > >>> diff --git a/net/rds/ib_rdma.c b/net/rds/ib_rdma.c index > > > > >>> 8f070ee..6b640b5 100644 > > > > >>> +++ b/net/rds/ib_rdma.c > > > > >>> @@ -393,6 +393,8 @@ int rds_ib_flush_mr_pool(struct > rds_ib_mr_pool *pool, > > > > >>> */ > > > > >>> dirty_to_clean = llist_append_to_list(&pool->drop_list, > &unmap_list); > > > > >>> dirty_to_clean += llist_append_to_list(&pool->free_list, > > > > >>> &unmap_list); > > > > >>> + WRITE_ONCE(pool->flush_ongoing, true); > > > > >>> + smp_wmb(); > > > > >>> if (free_all) { > > > > >>> unsigned long flags; > > > > >>> > > > > >>> @@ -430,6 +432,8 @@ int rds_ib_flush_mr_pool(struct > rds_ib_mr_pool *pool, > > > > >>> atomic_sub(nfreed, &pool->item_count); > > > > >>> > > > > >>> out: > > > > >>> + WRITE_ONCE(pool->flush_ongoing, false); > > > > >>> + smp_wmb(); > > > > >>> mutex_unlock(&pool->flush_lock); > > > > >>> if (waitqueue_active(&pool->flush_wait)) > > > > >>> wake_up(&pool->flush_wait); > > > > >>> @@ -507,8 +511,17 @@ void rds_ib_free_mr(void *trans_private, > > > > >>> int > > > > >>> invalidate) > > > > >>> > > > > >>> /* If we've pinned too many pages, request a flush */ > > > > >>> if (atomic_read(&pool->free_pinned) >= pool- > >max_free_pinned || > > > > >>> - atomic_read(&pool->dirty_count) >= pool->max_items / 5) > > > > >>> - queue_delayed_work(rds_ib_mr_wq, &pool- > >flush_worker, 10); > > > > >>> + atomic_read(&pool->dirty_count) >= pool->max_items / 5) > { > > > > >>> + smp_rmb(); > > > > >> You won’t need these explicit barriers since above atomic and > > > > >> write once already issue them. > > > > > > > > > > No, they don't. Use smp_store_release() and smp_load_acquire if > > > > > you want to do something like this, but I still can't quite > > > > > figure out if this usage of unlocked memory accesses makes any > sense at all. > > > > > > > > > Indeed, I see that now, thanks. Yeah, these multi variable checks > > > > can indeed be racy but they are under lock at least for this code path. > > > > But there are few hot path places where single variable states are > > > > evaluated atomically instead of heavy lock. > > > > > > At least pool->dirty_count is not locked in rds_ib_free_mr() at all. > > > > > > Thanks > > > > > > > > > > > Regards, > > > > Santosh > > > > > > > > > > Thank you Santosh, Leon and Jason for reviewing the Patch. > > > > > > 1. Leon, the bool variable "flush_ongoing " introduced through the patch > has to be accessed only after acquiring the mutex lock. Hence it is well > protected. > > > > I don't see any lock in rds_ib_free_mr() function where your perform "if > (!READ_ONCE(pool->flush_ongoing)) { ...". > > > > > > > > 2. As the commit message already conveys the reason, the check being > made in the function "rds_ib_free_mr" is only to avoid the redundant > asynchronous workers from being spawned. > > > > > > 3. The synchronous flush path's through the function "rds_free_mr" with > either cookie=0 or "invalidate" flag being set, have to be honoured as per the > semantics and hence these paths have to execute the function > "rds_ib_flush_mr_pool" unconditionally. > > > > > > 4. It complicates the patch to identify, where from the function > "rds_ib_flush_mr_pool", has been called. And hence, this patch uses the > state of the bool variable to stop the asynchronous workers. > > > > > > 5. We knew that "queue_delayed_work" will ensures only one work is > running, but avoiding these async workers during high load situations, made > way for the allocation and synchronous callers which would otherwise be > waiting long for the flush to happen. Great reduction in the number of calls > to the function "rds_ib_flush_mr_pool" has been observed with this patch. > > > > So if you understand that there is only one work in progress, why do you > say workerS? > > > > Thanks > > > > > > > > 6. Jason, the only function "rds_ib_free_mr" which accesses the > introduced bool variable "flush_ongoing" to spawn a flush worker does not > crucially impact the availability of MR's, because the flush happens from > allocation path as well when necessary. Hence the Load-store ordering is > not essentially needed here, because of which we chose smp_rmb() and > smp_wmb() over smp_load_acquire() and smp_store_release(). > > > > > > Regards, > > > Praveen. > > > > > > Jason, > > > > The relaxed ordering primitives smp_rmb() and smp_wmb() ensure > to provide > > guaranteed atomic memory operations READ_ONCE and > WRITE_ONCE, used in the > > functions "rds_ib_free_mr" and "rds_ib_flush_mr_pool" > correspondingly. > > > > Yes, the memory barrier primitives smp_load_acquire()and > smp_store_release() > > are even better. But, because of the simplicity of the use of memory > barrier > > in the patch, smp_rmb() and smp_wmb() are chosen. > > > > Please let me know if you want me to switch to smp_load_acquire() > and > > smp_store_release(). > > > > Leon, > > > > Avoiding the asynchronous worker from being spawned during the > high load situations, > > make way for both synchronous and allocation path to flush the mr > pool and grab the > > mr without waiting. > > > > Please let me know if you still have any queries with this respect or > any modifications > > are needed. > Thank you for your reply, Leon. > I didn't get any answer on my questions. > So let's me repeat them. > 1. Where can I see locks in rds_ib_free_mr() that protects concurrent change > of your new flush_ongoing field? flush_ongoing variable is only modified in the function "rds_ib_flush_mr_pool" under mutex lock. It is only being read atomically in the function "rds_ib_free_mr()", with memory barrier in place. Hence a lock is not essential in this function "rds_ib_free_mr()". Depending on the value being read, decision is taken weather to spawn the asynchronous worker or not. > 2. There is only one same work can be executed/scheduled, where do you > see multiple/parallel workers (plural) and how is it possible? In my earlier comment, I have re-iterated the same point. I would take back my word "workerS". By avoiding this asynchronous worker to participate in flushing, the synchronous flush jobs (cookie=0 and invalidate=1) as well as allocation path worker will be acquiring the mutex lock in the function "rds_ib_flush_mr_pool" quickly, thereby fetching the MR. I hope I am clear. > BTW, please fix your email client to reply inline. Fixed it. Thank you. > > > > > Regards, > > Praveen. Regards, Praveen.