Hi, Benny, > -----Original Message----- > From: linux-nfs-owner@xxxxxxxxxxxxxxx [mailto:linux-nfs-owner@xxxxxxxxxxxxxxx] > On Behalf Of Benny Halevy > Sent: Wednesday, September 21, 2011 3:05 PM > To: Peng, Tao > Cc: Trond.Myklebust@xxxxxxxxxx; linux-nfs@xxxxxxxxxxxxxxx; > honey@xxxxxxxxxxxxxx; rees@xxxxxxxxx > Subject: Re: [PATCH 2/3] pnfs: introduce pnfs private workqueue > > On 2011-09-21 08:16, tao.peng@xxxxxxx wrote: > >> -----Original Message----- > >> From: Myklebust, Trond [mailto:Trond.Myklebust@xxxxxxxxxx] > >> Sent: Wednesday, September 21, 2011 12:20 PM > >> To: Peng, Tao > >> Cc: bhalevy@xxxxxxxxxx; linux-nfs@xxxxxxxxxxxxxxx; honey@xxxxxxxxxxxxxx; > >> rees@xxxxxxxxx > >> Subject: RE: [PATCH 2/3] pnfs: introduce pnfs private workqueue > >> > >>> -----Original Message----- > >>> From: tao.peng@xxxxxxx [mailto:tao.peng@xxxxxxx] > >>> Sent: Tuesday, September 20, 2011 10:44 PM > >>> To: Myklebust, Trond > >>> Cc: bhalevy@xxxxxxxxxx; linux-nfs@xxxxxxxxxxxxxxx; > >> honey@xxxxxxxxxxxxxx; > >>> rees@xxxxxxxxx > >>> Subject: RE: [PATCH 2/3] pnfs: introduce pnfs private workqueue > >>> > >>> > >>>> -----Original Message----- > >>>> From: linux-nfs-owner@xxxxxxxxxxxxxxx > >>>> [mailto:linux-nfs-owner@xxxxxxxxxxxxxxx] > >>>> On Behalf Of Jim Rees > >>>> Sent: Wednesday, September 21, 2011 8:29 AM > >>>> To: Trond Myklebust > >>>> Cc: Benny Halevy; linux-nfs@xxxxxxxxxxxxxxx; peter honeyman > >>>> Subject: Re: [PATCH 2/3] pnfs: introduce pnfs private workqueue > >>>> > >>>> Trond Myklebust wrote: > >>>> > >>>> On Mon, 2011-09-19 at 23:18 -0400, Jim Rees wrote: > >>>> > From: Peng Tao <bergwolf@xxxxxxxxx> > >>>> > > >>>> > For layoutdriver io done functions, default workqueue is not a > >> good > >>> place as > >>>> > the code is executed in IO path. So add a pnfs private workqueue > >> to > >>> handle > >>>> > them. > >>>> > > >>>> > Also change block and object layout code to make use of this > >> private > >>>> > workqueue. > >>>> > > >>>> > >>>> Wait, what???? > >>>> > >>>> Why is the I/O path (i.e. the nfsiod queue) inappropriate for > >>>> layoutdriver io_done functions? > >>> The current code uses the system_wq to handle io_done functions. If > >> any > >>> function allocates memory in system_wq and causes dirty writeback in > >> nfs > >>> path, the io_done function can never be called because it is after the > >> original > >>> function in the system_wq. And you said that the rpciod/nfsiod is not > >> the > >>> ideal place because of memory allocation. So I suggested pnfs private > >>> workqueue and Benny agreed on it. > >> > >> You appear to be optimizing for a corner case. Why? > > Current code uses system_wq for io_done and it can cause deadlock. So this is > more than just an optimization. > > > >> > >> IOW: why do we need a whole new workqueue for something which is > >> supposed to be extremely rare? Why can't we just use standard keventd or > >> allocate a perfectly normal thread (e.g. the state recovery thread)? > > Unlike nfsiod and rpciod, keventd (aka, system_wq) is not created with > WQ_MEM_RECLAIM, indicating that it should not be used for IO path which can be > invoked during memory reclaim. And I agree that a normal kthread can solve it. > > > > However, the blocklayout write end_io function still needs a context where it can > allocate memory to convert extent state (for every IO that touches new extents). If > you look into the patch, we are not just using the pnfs private workqueue to handle > pnfs_ld_read/write_done, but also calling mark_extents_written() inside the > workqueue. > > Can this memory be preallocated before the I/O for the common case? It is possible to preallocate memory for the common case. But it doesn't remove the need for workqueue. Without workqueue, we will need to put a bunch of extent manipulation code into bio->end_io, which is executed in bottom half and we always need minimize its execution. Unless we do following: 1. preallocate memory for extent state convertion 2. use nfsiod/rpciod to handle bl_write_cleanup 3. for pnfs error case, create a kthread to recollapse and resend to MDS not sure if it worth the complexity though... Cheers, Tao > > Benny > > > If we don't create a pnfs private workqueue for pnfs_ld_read/write_done (i.e., by > just creating a normal kthread in error case), we would still have to create a > workqueue inside blocklayout driver to handle write end_io. And if blocklayout has a > private workqueue, it no longer needs the normal kthread for read/write_done error > handling, which leaves the kthread only specific to object layout (if it doesn't need a > workqueue like blocklayout does). > > > > So I think a generic pnfs workqueue is better because it simplifies things and solve > both problems. > > > > Best, > > Tao > > > >> > >> Trond > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html