Re: [PATCH 2/3] pnfs: introduce pnfs private workqueue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2011-09-21 14:10, tao.peng@xxxxxxx wrote:
>> -----Original Message-----
>> From: Benny Halevy [mailto:bhalevy@xxxxxxxxxx]
>> Sent: Wednesday, September 21, 2011 6:57 PM
>> To: Peng, Tao
>> Cc: Trond.Myklebust@xxxxxxxxxx; linux-nfs@xxxxxxxxxxxxxxx;
>> honey@xxxxxxxxxxxxxx; rees@xxxxxxxxx
>> Subject: Re: [PATCH 2/3] pnfs: introduce pnfs private workqueue
>>
>> On 2011-09-21 13:23, tao.peng@xxxxxxx wrote:
>>> Hi, Benny,
>>>
>>>> -----Original Message-----
>>>> From: linux-nfs-owner@xxxxxxxxxxxxxxx
>> [mailto:linux-nfs-owner@xxxxxxxxxxxxxxx]
>>>> On Behalf Of Benny Halevy
>>>> Sent: Wednesday, September 21, 2011 3:05 PM
>>>> To: Peng, Tao
>>>> Cc: Trond.Myklebust@xxxxxxxxxx; linux-nfs@xxxxxxxxxxxxxxx;
>>>> honey@xxxxxxxxxxxxxx; rees@xxxxxxxxx
>>>> Subject: Re: [PATCH 2/3] pnfs: introduce pnfs private workqueue
>>>>
>>>> On 2011-09-21 08:16, tao.peng@xxxxxxx wrote:
>>>>>> -----Original Message-----
>>>>>> From: Myklebust, Trond [mailto:Trond.Myklebust@xxxxxxxxxx]
>>>>>> Sent: Wednesday, September 21, 2011 12:20 PM
>>>>>> To: Peng, Tao
>>>>>> Cc: bhalevy@xxxxxxxxxx; linux-nfs@xxxxxxxxxxxxxxx; honey@xxxxxxxxxxxxxx;
>>>>>> rees@xxxxxxxxx
>>>>>> Subject: RE: [PATCH 2/3] pnfs: introduce pnfs private workqueue
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: tao.peng@xxxxxxx [mailto:tao.peng@xxxxxxx]
>>>>>>> Sent: Tuesday, September 20, 2011 10:44 PM
>>>>>>> To: Myklebust, Trond
>>>>>>> Cc: bhalevy@xxxxxxxxxx; linux-nfs@xxxxxxxxxxxxxxx;
>>>>>> honey@xxxxxxxxxxxxxx;
>>>>>>> rees@xxxxxxxxx
>>>>>>> Subject: RE: [PATCH 2/3] pnfs: introduce pnfs private workqueue
>>>>>>>
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: linux-nfs-owner@xxxxxxxxxxxxxxx
>>>>>>>> [mailto:linux-nfs-owner@xxxxxxxxxxxxxxx]
>>>>>>>> On Behalf Of Jim Rees
>>>>>>>> Sent: Wednesday, September 21, 2011 8:29 AM
>>>>>>>> To: Trond Myklebust
>>>>>>>> Cc: Benny Halevy; linux-nfs@xxxxxxxxxxxxxxx; peter honeyman
>>>>>>>> Subject: Re: [PATCH 2/3] pnfs: introduce pnfs private workqueue
>>>>>>>>
>>>>>>>> Trond Myklebust wrote:
>>>>>>>>
>>>>>>>>   On Mon, 2011-09-19 at 23:18 -0400, Jim Rees wrote:
>>>>>>>>   > From: Peng Tao <bergwolf@xxxxxxxxx>
>>>>>>>>   >
>>>>>>>>   > For layoutdriver io done functions, default workqueue is not a
>>>>>> good
>>>>>>> place as
>>>>>>>>   > the code is executed in IO path. So add a pnfs private workqueue
>>>>>> to
>>>>>>> handle
>>>>>>>>   > them.
>>>>>>>>   >
>>>>>>>>   > Also change block and object layout code to make use of this
>>>>>> private
>>>>>>>>   > workqueue.
>>>>>>>>   >
>>>>>>>>
>>>>>>>>   Wait, what????
>>>>>>>>
>>>>>>>>   Why is the I/O path (i.e. the nfsiod queue) inappropriate for
>>>>>>>>   layoutdriver io_done functions?
>>>>>>> The current code uses the system_wq to handle io_done functions. If
>>>>>> any
>>>>>>> function allocates memory in system_wq and causes dirty writeback in
>>>>>> nfs
>>>>>>> path, the io_done function can never be called because it is after the
>>>>>> original
>>>>>>> function in the system_wq. And you said that the rpciod/nfsiod is not
>>>>>> the
>>>>>>> ideal place because of memory allocation. So I suggested pnfs private
>>>>>>> workqueue and Benny agreed on it.
>>>>>>
>>>>>> You appear to be optimizing for a corner case. Why?
>>>>> Current code uses system_wq for io_done and it can cause deadlock. So this is
>>>> more than just an optimization.
>>>>>
>>>>>>
>>>>>> IOW: why do we need a whole new workqueue for something which is
>>>>>> supposed to be extremely rare? Why can't we just use standard keventd or
>>>>>> allocate a perfectly normal thread (e.g. the state recovery thread)?
>>>>> Unlike nfsiod and rpciod, keventd (aka, system_wq) is not created with
>>>> WQ_MEM_RECLAIM, indicating that it should not be used for IO path which can
>> be
>>>> invoked during memory reclaim. And I agree that a normal kthread can solve it.
>>>>>
>>>>> However, the blocklayout write end_io function still needs a context where it
>> can
>>>> allocate memory to convert extent state (for every IO that touches new extents).
>> If
>>>> you look into the patch, we are not just using the pnfs private workqueue to
>> handle
>>>> pnfs_ld_read/write_done, but also calling mark_extents_written() inside the
>>>> workqueue.
>>>>
>>>> Can this memory be preallocated before the I/O for the common case?
>>> It is possible to preallocate memory for the common case. But it doesn't remove
>> the need for workqueue. Without workqueue, we will need to put a bunch of extent
>> manipulation code into bio->end_io, which is executed in bottom half and we
>> always need minimize its execution.
>>>
>>> Unless we do following:
>>> 1. preallocate memory for extent state convertion
>>> 2. use nfsiod/rpciod to handle bl_write_cleanup
>>> 3. for pnfs error case, create a kthread to recollapse and resend to MDS
>>>
>>> not sure if it worth the complexity though...
>>
>> We do want to avoid memory allocation on the completion path to ensure
>> completion under memory pressure and graceful handling of low memory
>> conditions
>> so I think this approach is worth while.
>>
>> Having a lightweight handler on the bottom half done path is ok too.
>>
>> Doing the heavy lifting for the (assumingly rare) error case in a workqueue/
>> kthread makes a sense, just it mustn't block.  Could you use the nfs state manager
>> for that?
> I don't quite understand. How do you use nfs state manager to do other tasks?

You need to keep a list of things to do hanging off of the nfs client structure
and set a bit in cl_state telling the state manager it has work to do
and wake it up.  It then needs to go over the list of, say nfs_inodes
and call into the layout driver to handle the errors.

Benny

> 
> Thanks,
> Tao
> 
>>
>> Benny
>>
>>>
>>> Cheers,
>>> Tao
>>>
>>>>
>>>> Benny
>>>>
>>>>> If we don't create a pnfs private workqueue for pnfs_ld_read/write_done (i.e.,
>> by
>>>> just creating a normal kthread in error case), we would still have to create a
>>>> workqueue inside blocklayout driver to handle write end_io. And if blocklayout
>> has a
>>>> private workqueue, it no longer needs the normal kthread for read/write_done
>> error
>>>> handling, which leaves the kthread only specific to object layout (if it doesn't
>> need a
>>>> workqueue like blocklayout does).
>>>>>
>>>>> So I think a generic pnfs workqueue is better because it simplifies things and
>> solve
>>>> both problems.
>>>>>
>>>>> Best,
>>>>> Tao
>>>>>
>>>>>>
>>>>>> Trond
>>>>>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux