On Fri, Sep 28, 2018 at 11:45 AM Yan, Zheng <zyan@xxxxxxxxxx> wrote: > > large read may require allocating large page vector > > Signed-off-by: "Yan, Zheng" <zyan@xxxxxxxxxx> > --- > net/ceph/pagevec.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/net/ceph/pagevec.c b/net/ceph/pagevec.c > index d3736f5bffec..b1c47ba0c38a 100644 > --- a/net/ceph/pagevec.c > +++ b/net/ceph/pagevec.c > @@ -62,7 +62,7 @@ void ceph_release_page_vector(struct page **pages, int num_pages) > > for (i = 0; i < num_pages; i++) > __free_pages(pages[i], 0); > - kfree(pages); > + kvfree(pages); > } > EXPORT_SYMBOL(ceph_release_page_vector); > > @@ -74,7 +74,7 @@ struct page **ceph_alloc_page_vector(int num_pages, gfp_t flags) > struct page **pages; > int i; > > - pages = kmalloc_array(num_pages, sizeof(*pages), flags); > + pages = kvmalloc_array(num_pages, sizeof(*pages), flags); > if (!pages) > return ERR_PTR(-ENOMEM); > for (i = 0; i < num_pages; i++) { Have you considered rewriting ceph_sync_read()? In the end, it does a synchronous OSD request per object, it's not atomic in any way, so in theory it shouldn't need more than object_size worth of pages. Fengguang's application is doing ~100M reads, and the one he reported was likely a ~2G read. The page pointer array itself is only part of problem, allocating 2G worth of pages when we don't actually need all of them at once is wrong. Thanks, Ilya