Re: [PATCH] libceph: use ceph_kvmalloc() for osdmap arrays

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Sep 11, 2019 at 4:54 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote:
>
> On Tue, 2019-09-10 at 21:41 +0200, Ilya Dryomov wrote:
> > osdmap has a bunch of arrays that grow linearly with the number of
> > OSDs.  osd_state, osd_weight and osd_primary_affinity take 4 bytes per
> > OSD.  osd_addr takes 136 bytes per OSD because of sockaddr_storage.
> > The CRUSH workspace area also grows linearly with the number of OSDs.
> >
> > Normally these arrays are allocated at client startup.  The osdmap is
> > usually updated in small incrementals, but once in a while a full map
> > may need to be processed.  For a cluster with 10000 OSDs, this means
> > a bunch of 40K allocations followed by a 1.3M allocation, all of which
> > are currently required to be physically contiguous.  This results in
> > sporadic ENOMEM errors, hanging the client.
> >
> > Go back to manually (re)allocating arrays and use ceph_kvmalloc() to
> > fall back to non-contiguous allocation when necessary.
> >
> > Link: https://tracker.ceph.com/issues/40481
> > Signed-off-by: Ilya Dryomov <idryomov@xxxxxxxxx>
> > ---
> >  net/ceph/osdmap.c | 69 +++++++++++++++++++++++++++++------------------
> >  1 file changed, 43 insertions(+), 26 deletions(-)
> >
> > diff --git a/net/ceph/osdmap.c b/net/ceph/osdmap.c
> > index 90437906b7bc..4e0de14f80bb 100644
> > --- a/net/ceph/osdmap.c
> > +++ b/net/ceph/osdmap.c
> > @@ -973,11 +973,11 @@ void ceph_osdmap_destroy(struct ceph_osdmap *map)
> >                                struct ceph_pg_pool_info, node);
> >               __remove_pg_pool(&map->pg_pools, pi);
> >       }
> > -     kfree(map->osd_state);
> > -     kfree(map->osd_weight);
> > -     kfree(map->osd_addr);
> > -     kfree(map->osd_primary_affinity);
> > -     kfree(map->crush_workspace);
> > +     kvfree(map->osd_state);
> > +     kvfree(map->osd_weight);
> > +     kvfree(map->osd_addr);
> > +     kvfree(map->osd_primary_affinity);
> > +     kvfree(map->crush_workspace);
> >       kfree(map);
> >  }
> >
> > @@ -986,28 +986,41 @@ void ceph_osdmap_destroy(struct ceph_osdmap *map)
> >   *
> >   * The new elements are properly initialized.
> >   */
> > -static int osdmap_set_max_osd(struct ceph_osdmap *map, int max)
> > +static int osdmap_set_max_osd(struct ceph_osdmap *map, u32 max)
> >  {
> >       u32 *state;
> >       u32 *weight;
> >       struct ceph_entity_addr *addr;
> > +     u32 to_copy;
> >       int i;
> >
> > -     state = krealloc(map->osd_state, max*sizeof(*state), GFP_NOFS);
> > -     if (!state)
> > -             return -ENOMEM;
> > -     map->osd_state = state;
> > +     dout("%s old %u new %u\n", __func__, map->max_osd, max);
> > +     if (max == map->max_osd)
> > +             return 0;
> >
> > -     weight = krealloc(map->osd_weight, max*sizeof(*weight), GFP_NOFS);
> > -     if (!weight)
> > +     state = ceph_kvmalloc(array_size(max, sizeof(*state)), GFP_NOFS);
> > +     weight = ceph_kvmalloc(array_size(max, sizeof(*weight)), GFP_NOFS);
> > +     addr = ceph_kvmalloc(array_size(max, sizeof(*addr)), GFP_NOFS);
>
> Is GFP_NOFS sufficient here, given that this may be called from rbd?
> Should we be using NOIO instead (or maybe the PF_MEMALLOC_* equivalent)?

It should be NOIO, but it has been this way forever, so I kept it
(keeping the future conversion to scopes that I mentioned in another
email in mind).

Thanks,

                Ilya



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux