RE: [PATCH 2/3] pnfs: introduce pnfs private workqueue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi, Benny,

> -----Original Message-----
> From: linux-nfs-owner@xxxxxxxxxxxxxxx [mailto:linux-nfs-owner@xxxxxxxxxxxxxxx]
> On Behalf Of Benny Halevy
> Sent: Wednesday, September 21, 2011 3:05 PM
> To: Peng, Tao
> Cc: Trond.Myklebust@xxxxxxxxxx; linux-nfs@xxxxxxxxxxxxxxx;
> honey@xxxxxxxxxxxxxx; rees@xxxxxxxxx
> Subject: Re: [PATCH 2/3] pnfs: introduce pnfs private workqueue
> 
> On 2011-09-21 08:16, tao.peng@xxxxxxx wrote:
> >> -----Original Message-----
> >> From: Myklebust, Trond [mailto:Trond.Myklebust@xxxxxxxxxx]
> >> Sent: Wednesday, September 21, 2011 12:20 PM
> >> To: Peng, Tao
> >> Cc: bhalevy@xxxxxxxxxx; linux-nfs@xxxxxxxxxxxxxxx; honey@xxxxxxxxxxxxxx;
> >> rees@xxxxxxxxx
> >> Subject: RE: [PATCH 2/3] pnfs: introduce pnfs private workqueue
> >>
> >>> -----Original Message-----
> >>> From: tao.peng@xxxxxxx [mailto:tao.peng@xxxxxxx]
> >>> Sent: Tuesday, September 20, 2011 10:44 PM
> >>> To: Myklebust, Trond
> >>> Cc: bhalevy@xxxxxxxxxx; linux-nfs@xxxxxxxxxxxxxxx;
> >> honey@xxxxxxxxxxxxxx;
> >>> rees@xxxxxxxxx
> >>> Subject: RE: [PATCH 2/3] pnfs: introduce pnfs private workqueue
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: linux-nfs-owner@xxxxxxxxxxxxxxx
> >>>> [mailto:linux-nfs-owner@xxxxxxxxxxxxxxx]
> >>>> On Behalf Of Jim Rees
> >>>> Sent: Wednesday, September 21, 2011 8:29 AM
> >>>> To: Trond Myklebust
> >>>> Cc: Benny Halevy; linux-nfs@xxxxxxxxxxxxxxx; peter honeyman
> >>>> Subject: Re: [PATCH 2/3] pnfs: introduce pnfs private workqueue
> >>>>
> >>>> Trond Myklebust wrote:
> >>>>
> >>>>   On Mon, 2011-09-19 at 23:18 -0400, Jim Rees wrote:
> >>>>   > From: Peng Tao <bergwolf@xxxxxxxxx>
> >>>>   >
> >>>>   > For layoutdriver io done functions, default workqueue is not a
> >> good
> >>> place as
> >>>>   > the code is executed in IO path. So add a pnfs private workqueue
> >> to
> >>> handle
> >>>>   > them.
> >>>>   >
> >>>>   > Also change block and object layout code to make use of this
> >> private
> >>>>   > workqueue.
> >>>>   >
> >>>>
> >>>>   Wait, what????
> >>>>
> >>>>   Why is the I/O path (i.e. the nfsiod queue) inappropriate for
> >>>>   layoutdriver io_done functions?
> >>> The current code uses the system_wq to handle io_done functions. If
> >> any
> >>> function allocates memory in system_wq and causes dirty writeback in
> >> nfs
> >>> path, the io_done function can never be called because it is after the
> >> original
> >>> function in the system_wq. And you said that the rpciod/nfsiod is not
> >> the
> >>> ideal place because of memory allocation. So I suggested pnfs private
> >>> workqueue and Benny agreed on it.
> >>
> >> You appear to be optimizing for a corner case. Why?
> > Current code uses system_wq for io_done and it can cause deadlock. So this is
> more than just an optimization.
> >
> >>
> >> IOW: why do we need a whole new workqueue for something which is
> >> supposed to be extremely rare? Why can't we just use standard keventd or
> >> allocate a perfectly normal thread (e.g. the state recovery thread)?
> > Unlike nfsiod and rpciod, keventd (aka, system_wq) is not created with
> WQ_MEM_RECLAIM, indicating that it should not be used for IO path which can be
> invoked during memory reclaim. And I agree that a normal kthread can solve it.
> >
> > However, the blocklayout write end_io function still needs a context where it can
> allocate memory to convert extent state (for every IO that touches new extents). If
> you look into the patch, we are not just using the pnfs private workqueue to handle
> pnfs_ld_read/write_done, but also calling mark_extents_written() inside the
> workqueue.
> 
> Can this memory be preallocated before the I/O for the common case?
It is possible to preallocate memory for the common case. But it doesn't remove the need for workqueue. Without workqueue, we will need to put a bunch of extent manipulation code into bio->end_io, which is executed in bottom half and we always need minimize its execution.

Unless we do following:
1. preallocate memory for extent state convertion
2. use nfsiod/rpciod to handle bl_write_cleanup
3. for pnfs error case, create a kthread to recollapse and resend to MDS

not sure if it worth the complexity though...

Cheers,
Tao

> 
> Benny
> 
> > If we don't create a pnfs private workqueue for pnfs_ld_read/write_done (i.e., by
> just creating a normal kthread in error case), we would still have to create a
> workqueue inside blocklayout driver to handle write end_io. And if blocklayout has a
> private workqueue, it no longer needs the normal kthread for read/write_done error
> handling, which leaves the kthread only specific to object layout (if it doesn't need a
> workqueue like blocklayout does).
> >
> > So I think a generic pnfs workqueue is better because it simplifies things and solve
> both problems.
> >
> > Best,
> > Tao
> >
> >>
> >> Trond
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux