On 2011-09-21 13:23, tao.peng@xxxxxxx wrote: > Hi, Benny, > >> -----Original Message----- >> From: linux-nfs-owner@xxxxxxxxxxxxxxx [mailto:linux-nfs-owner@xxxxxxxxxxxxxxx] >> On Behalf Of Benny Halevy >> Sent: Wednesday, September 21, 2011 3:05 PM >> To: Peng, Tao >> Cc: Trond.Myklebust@xxxxxxxxxx; linux-nfs@xxxxxxxxxxxxxxx; >> honey@xxxxxxxxxxxxxx; rees@xxxxxxxxx >> Subject: Re: [PATCH 2/3] pnfs: introduce pnfs private workqueue >> >> On 2011-09-21 08:16, tao.peng@xxxxxxx wrote: >>>> -----Original Message----- >>>> From: Myklebust, Trond [mailto:Trond.Myklebust@xxxxxxxxxx] >>>> Sent: Wednesday, September 21, 2011 12:20 PM >>>> To: Peng, Tao >>>> Cc: bhalevy@xxxxxxxxxx; linux-nfs@xxxxxxxxxxxxxxx; honey@xxxxxxxxxxxxxx; >>>> rees@xxxxxxxxx >>>> Subject: RE: [PATCH 2/3] pnfs: introduce pnfs private workqueue >>>> >>>>> -----Original Message----- >>>>> From: tao.peng@xxxxxxx [mailto:tao.peng@xxxxxxx] >>>>> Sent: Tuesday, September 20, 2011 10:44 PM >>>>> To: Myklebust, Trond >>>>> Cc: bhalevy@xxxxxxxxxx; linux-nfs@xxxxxxxxxxxxxxx; >>>> honey@xxxxxxxxxxxxxx; >>>>> rees@xxxxxxxxx >>>>> Subject: RE: [PATCH 2/3] pnfs: introduce pnfs private workqueue >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: linux-nfs-owner@xxxxxxxxxxxxxxx >>>>>> [mailto:linux-nfs-owner@xxxxxxxxxxxxxxx] >>>>>> On Behalf Of Jim Rees >>>>>> Sent: Wednesday, September 21, 2011 8:29 AM >>>>>> To: Trond Myklebust >>>>>> Cc: Benny Halevy; linux-nfs@xxxxxxxxxxxxxxx; peter honeyman >>>>>> Subject: Re: [PATCH 2/3] pnfs: introduce pnfs private workqueue >>>>>> >>>>>> Trond Myklebust wrote: >>>>>> >>>>>> On Mon, 2011-09-19 at 23:18 -0400, Jim Rees wrote: >>>>>> > From: Peng Tao <bergwolf@xxxxxxxxx> >>>>>> > >>>>>> > For layoutdriver io done functions, default workqueue is not a >>>> good >>>>> place as >>>>>> > the code is executed in IO path. So add a pnfs private workqueue >>>> to >>>>> handle >>>>>> > them. >>>>>> > >>>>>> > Also change block and object layout code to make use of this >>>> private >>>>>> > workqueue. >>>>>> > >>>>>> >>>>>> Wait, what???? >>>>>> >>>>>> Why is the I/O path (i.e. the nfsiod queue) inappropriate for >>>>>> layoutdriver io_done functions? >>>>> The current code uses the system_wq to handle io_done functions. If >>>> any >>>>> function allocates memory in system_wq and causes dirty writeback in >>>> nfs >>>>> path, the io_done function can never be called because it is after the >>>> original >>>>> function in the system_wq. And you said that the rpciod/nfsiod is not >>>> the >>>>> ideal place because of memory allocation. So I suggested pnfs private >>>>> workqueue and Benny agreed on it. >>>> >>>> You appear to be optimizing for a corner case. Why? >>> Current code uses system_wq for io_done and it can cause deadlock. So this is >> more than just an optimization. >>> >>>> >>>> IOW: why do we need a whole new workqueue for something which is >>>> supposed to be extremely rare? Why can't we just use standard keventd or >>>> allocate a perfectly normal thread (e.g. the state recovery thread)? >>> Unlike nfsiod and rpciod, keventd (aka, system_wq) is not created with >> WQ_MEM_RECLAIM, indicating that it should not be used for IO path which can be >> invoked during memory reclaim. And I agree that a normal kthread can solve it. >>> >>> However, the blocklayout write end_io function still needs a context where it can >> allocate memory to convert extent state (for every IO that touches new extents). If >> you look into the patch, we are not just using the pnfs private workqueue to handle >> pnfs_ld_read/write_done, but also calling mark_extents_written() inside the >> workqueue. >> >> Can this memory be preallocated before the I/O for the common case? > It is possible to preallocate memory for the common case. But it doesn't remove the need for workqueue. Without workqueue, we will need to put a bunch of extent manipulation code into bio->end_io, which is executed in bottom half and we always need minimize its execution. > > Unless we do following: > 1. preallocate memory for extent state convertion > 2. use nfsiod/rpciod to handle bl_write_cleanup > 3. for pnfs error case, create a kthread to recollapse and resend to MDS > > not sure if it worth the complexity though... We do want to avoid memory allocation on the completion path to ensure completion under memory pressure and graceful handling of low memory conditions so I think this approach is worth while. Having a lightweight handler on the bottom half done path is ok too. Doing the heavy lifting for the (assumingly rare) error case in a workqueue/ kthread makes a sense, just it mustn't block. Could you use the nfs state manager for that? Benny > > Cheers, > Tao > >> >> Benny >> >>> If we don't create a pnfs private workqueue for pnfs_ld_read/write_done (i.e., by >> just creating a normal kthread in error case), we would still have to create a >> workqueue inside blocklayout driver to handle write end_io. And if blocklayout has a >> private workqueue, it no longer needs the normal kthread for read/write_done error >> handling, which leaves the kthread only specific to object layout (if it doesn't need a >> workqueue like blocklayout does). >>> >>> So I think a generic pnfs workqueue is better because it simplifies things and solve >> both problems. >>> >>> Best, >>> Tao >>> >>>> >>>> Trond >>> -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html