RE: [PATCH 2/3] pnfs: introduce pnfs private workqueue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> -----Original Message-----
> From: Benny Halevy [mailto:bhalevy@xxxxxxxxxx]
> Sent: Wednesday, September 21, 2011 6:57 PM
> To: Peng, Tao
> Cc: Trond.Myklebust@xxxxxxxxxx; linux-nfs@xxxxxxxxxxxxxxx;
> honey@xxxxxxxxxxxxxx; rees@xxxxxxxxx
> Subject: Re: [PATCH 2/3] pnfs: introduce pnfs private workqueue
> 
> On 2011-09-21 13:23, tao.peng@xxxxxxx wrote:
> > Hi, Benny,
> >
> >> -----Original Message-----
> >> From: linux-nfs-owner@xxxxxxxxxxxxxxx
> [mailto:linux-nfs-owner@xxxxxxxxxxxxxxx]
> >> On Behalf Of Benny Halevy
> >> Sent: Wednesday, September 21, 2011 3:05 PM
> >> To: Peng, Tao
> >> Cc: Trond.Myklebust@xxxxxxxxxx; linux-nfs@xxxxxxxxxxxxxxx;
> >> honey@xxxxxxxxxxxxxx; rees@xxxxxxxxx
> >> Subject: Re: [PATCH 2/3] pnfs: introduce pnfs private workqueue
> >>
> >> On 2011-09-21 08:16, tao.peng@xxxxxxx wrote:
> >>>> -----Original Message-----
> >>>> From: Myklebust, Trond [mailto:Trond.Myklebust@xxxxxxxxxx]
> >>>> Sent: Wednesday, September 21, 2011 12:20 PM
> >>>> To: Peng, Tao
> >>>> Cc: bhalevy@xxxxxxxxxx; linux-nfs@xxxxxxxxxxxxxxx; honey@xxxxxxxxxxxxxx;
> >>>> rees@xxxxxxxxx
> >>>> Subject: RE: [PATCH 2/3] pnfs: introduce pnfs private workqueue
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: tao.peng@xxxxxxx [mailto:tao.peng@xxxxxxx]
> >>>>> Sent: Tuesday, September 20, 2011 10:44 PM
> >>>>> To: Myklebust, Trond
> >>>>> Cc: bhalevy@xxxxxxxxxx; linux-nfs@xxxxxxxxxxxxxxx;
> >>>> honey@xxxxxxxxxxxxxx;
> >>>>> rees@xxxxxxxxx
> >>>>> Subject: RE: [PATCH 2/3] pnfs: introduce pnfs private workqueue
> >>>>>
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: linux-nfs-owner@xxxxxxxxxxxxxxx
> >>>>>> [mailto:linux-nfs-owner@xxxxxxxxxxxxxxx]
> >>>>>> On Behalf Of Jim Rees
> >>>>>> Sent: Wednesday, September 21, 2011 8:29 AM
> >>>>>> To: Trond Myklebust
> >>>>>> Cc: Benny Halevy; linux-nfs@xxxxxxxxxxxxxxx; peter honeyman
> >>>>>> Subject: Re: [PATCH 2/3] pnfs: introduce pnfs private workqueue
> >>>>>>
> >>>>>> Trond Myklebust wrote:
> >>>>>>
> >>>>>>   On Mon, 2011-09-19 at 23:18 -0400, Jim Rees wrote:
> >>>>>>   > From: Peng Tao <bergwolf@xxxxxxxxx>
> >>>>>>   >
> >>>>>>   > For layoutdriver io done functions, default workqueue is not a
> >>>> good
> >>>>> place as
> >>>>>>   > the code is executed in IO path. So add a pnfs private workqueue
> >>>> to
> >>>>> handle
> >>>>>>   > them.
> >>>>>>   >
> >>>>>>   > Also change block and object layout code to make use of this
> >>>> private
> >>>>>>   > workqueue.
> >>>>>>   >
> >>>>>>
> >>>>>>   Wait, what????
> >>>>>>
> >>>>>>   Why is the I/O path (i.e. the nfsiod queue) inappropriate for
> >>>>>>   layoutdriver io_done functions?
> >>>>> The current code uses the system_wq to handle io_done functions. If
> >>>> any
> >>>>> function allocates memory in system_wq and causes dirty writeback in
> >>>> nfs
> >>>>> path, the io_done function can never be called because it is after the
> >>>> original
> >>>>> function in the system_wq. And you said that the rpciod/nfsiod is not
> >>>> the
> >>>>> ideal place because of memory allocation. So I suggested pnfs private
> >>>>> workqueue and Benny agreed on it.
> >>>>
> >>>> You appear to be optimizing for a corner case. Why?
> >>> Current code uses system_wq for io_done and it can cause deadlock. So this is
> >> more than just an optimization.
> >>>
> >>>>
> >>>> IOW: why do we need a whole new workqueue for something which is
> >>>> supposed to be extremely rare? Why can't we just use standard keventd or
> >>>> allocate a perfectly normal thread (e.g. the state recovery thread)?
> >>> Unlike nfsiod and rpciod, keventd (aka, system_wq) is not created with
> >> WQ_MEM_RECLAIM, indicating that it should not be used for IO path which can
> be
> >> invoked during memory reclaim. And I agree that a normal kthread can solve it.
> >>>
> >>> However, the blocklayout write end_io function still needs a context where it
> can
> >> allocate memory to convert extent state (for every IO that touches new extents).
> If
> >> you look into the patch, we are not just using the pnfs private workqueue to
> handle
> >> pnfs_ld_read/write_done, but also calling mark_extents_written() inside the
> >> workqueue.
> >>
> >> Can this memory be preallocated before the I/O for the common case?
> > It is possible to preallocate memory for the common case. But it doesn't remove
> the need for workqueue. Without workqueue, we will need to put a bunch of extent
> manipulation code into bio->end_io, which is executed in bottom half and we
> always need minimize its execution.
> >
> > Unless we do following:
> > 1. preallocate memory for extent state convertion
> > 2. use nfsiod/rpciod to handle bl_write_cleanup
> > 3. for pnfs error case, create a kthread to recollapse and resend to MDS
> >
> > not sure if it worth the complexity though...
> 
> We do want to avoid memory allocation on the completion path to ensure
> completion under memory pressure and graceful handling of low memory
> conditions
> so I think this approach is worth while.
> 
> Having a lightweight handler on the bottom half done path is ok too.
> 
> Doing the heavy lifting for the (assumingly rare) error case in a workqueue/
> kthread makes a sense, just it mustn't block.  Could you use the nfs state manager
> for that?
I don't quite understand. How do you use nfs state manager to do other tasks?

Thanks,
Tao

> 
> Benny
> 
> >
> > Cheers,
> > Tao
> >
> >>
> >> Benny
> >>
> >>> If we don't create a pnfs private workqueue for pnfs_ld_read/write_done (i.e.,
> by
> >> just creating a normal kthread in error case), we would still have to create a
> >> workqueue inside blocklayout driver to handle write end_io. And if blocklayout
> has a
> >> private workqueue, it no longer needs the normal kthread for read/write_done
> error
> >> handling, which leaves the kthread only specific to object layout (if it doesn't
> need a
> >> workqueue like blocklayout does).
> >>>
> >>> So I think a generic pnfs workqueue is better because it simplifies things and
> solve
> >> both problems.
> >>>
> >>> Best,
> >>> Tao
> >>>
> >>>>
> >>>> Trond
> >>>

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux