Re: RFA (Request for Advice): block/bio: get_user_pages() --> pin_user_pages()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon 24-01-22 10:38:13, Chaitanya Kulkarni wrote:
> On 1/24/22 2:05 AM, Jan Kara wrote:
> > External email: Use caution opening links or attachments
> > 
> > Hello,
> > 
> > On Sun 23-01-22 23:52:07, John Hubbard wrote:
> >> Background: despite having very little experience in the block and bio
> >> layers, I am attempting to convert the Direct IO parts of them from
> >> using get_user_pages_fast(), to pin_user_pages_fast(). This requires the
> >> use of a corresponding special release call: unpin_user_pages(), instead
> >> of the generic put_page().
> >>
> >> Fortunately, Christoph Hellwig has observed [1] (more than once [2]) that
> >> only "a little" refactoring is required, because it is *almost* true
> >> that bio_release_pages() could just be switched over from calling
> >> put_page(), to unpin_user_page(). The "not quite" part is mainly due to
> >> the zero page. There are a few write paths that pad zeroes, and they use
> >> the zero page.
> >>
> >> That's where I'd like some advice. How to refactor things, so that the
> >> zero page does not end up in the list of pages that bio_release_pages()
> >> acts upon?
> 
> this maybe wrong but thinking out loudly, have you consider adding a 
> ZERO_PAGE() address check since it should have a unique same
> address for each ZERO_PAGE() (unless I'm totally wrong here) and
> using this check you can distinguish between ZERO_PAGE() and
> non ZERO_PAGE() on the bio list in bio_release_pages().

Well, that is another option but it seems a bit ugly and also on some
architectures (e.g. s390 AFAICS) there can be multiple zero pages (due to
coloring) so the test for zero page is not completely trivial (probably we
would have to grow some is_zero_page() checking function implemented
separately for each arch).

> >> To ground this in reality, one of the partial call stacks is:
> >>
> >> do_direct_IO()
> >>      dio_zero_block()
> >>          page = ZERO_PAGE(0); <-- This is a problem
> >>
> >> I'm not sure what to use, instead of that zero page! The zero page
> >> doesn't need to be allocated nor tracked, and so any replacement
> >> approaches would need either other storage, or some horrid scheme that I
> >> won't go so far as to write on the screen. :)
> > 
> > Well, I'm not sure if you consider this ugly but currently we use
> > get_page() in that path exactly so that bio_release_pages() does not have
> > to care about zero page. So now we could grab pin on the zero page instead
> > through try_grab_page() or something like that...
> > 
> >        
> 
> submit_page_section() does call get_page() in that same path 
> irrespective of whether it is ZERO_PAGE() or not, this actually
> makes accounting much easier and we also avoid  any special case
> for ZERO_PAGE().
> 
> dio_zero_block()
>   submit_page_section()
>     get_page()
> 
> 
> That also means that on completion of dio for each bio we also call
> put_page() from bio_release_page() path.

Well, yes. fs/direct-io.c grabs two page references in fact. One is passed
along with the bio and released by bio_release_pages(), the other one is
released directly in fs/direct-io.c after IO is submitted. We need to
somewhat reorganize the code to better define which is which because with
pinning they would be of different kind.

For zero page we currently grab the reference in dio_refill_pages() (this
needs to be converted to try_grab_page()) and we don't do anything in
dio_zero_block() - that code needs to be cleaned up to also call
try_grab_page() and we need to somehow deal with the ref currently grabbed
in submit_page_section()...

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux