Re: [PATCH 3/4] libceph: use kvmalloc to allocate page vector

Ilya Dryomov <idryomov@xxxxxxxxx> · Fri, 28 Sep 2018 13:47:55 +0200

On Fri, Sep 28, 2018 at 11:45 AM Yan, Zheng <zyan@xxxxxxxxxx> wrote:
>
> large read may require allocating large page vector
>
> Signed-off-by: "Yan, Zheng" <zyan@xxxxxxxxxx>
> ---
>  net/ceph/pagevec.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/net/ceph/pagevec.c b/net/ceph/pagevec.c
> index d3736f5bffec..b1c47ba0c38a 100644
> --- a/net/ceph/pagevec.c
> +++ b/net/ceph/pagevec.c
> @@ -62,7 +62,7 @@ void ceph_release_page_vector(struct page **pages, int num_pages)
>
>         for (i = 0; i < num_pages; i++)
>                 __free_pages(pages[i], 0);
> -       kfree(pages);
> +       kvfree(pages);
>  }
>  EXPORT_SYMBOL(ceph_release_page_vector);
>
> @@ -74,7 +74,7 @@ struct page **ceph_alloc_page_vector(int num_pages, gfp_t flags)
>         struct page **pages;
>         int i;
>
> -       pages = kmalloc_array(num_pages, sizeof(*pages), flags);
> +       pages = kvmalloc_array(num_pages, sizeof(*pages), flags);
>         if (!pages)
>                 return ERR_PTR(-ENOMEM);
>         for (i = 0; i < num_pages; i++) {

Have you considered rewriting ceph_sync_read()?  In the end, it does
a synchronous OSD request per object, it's not atomic in any way, so in
theory it shouldn't need more than object_size worth of pages.

Fengguang's application is doing ~100M reads, and the one he reported
was likely a ~2G read.  The page pointer array itself is only part of
problem, allocating 2G worth of pages when we don't actually need all
of them at once is wrong.

Thanks,

                Ilya