Re: [RFC][PATCH] drivers: base: dynamic memory block creation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Aug 14, 2013 at 01:47:27PM -0700, Dave Hansen wrote:
> On 08/14/2013 12:31 PM, Seth Jennings wrote:
> > There was a significant amount of refactoring to allow for this but
> > IMHO, the code is much easier to understand now.
> ...
> >  drivers/base/memory.c  | 248 +++++++++++++++++++++++++++++++++++++------------
> >  include/linux/memory.h |   1 -
> >  2 files changed, 188 insertions(+), 61 deletions(-)
> 
> Adding 120 lines of code made it easier to understand? ;)

Currently, the memory block abstraction is bolted onto the section
layout pretty loosely. The add_memory_section() function is
particularly bad as the memory block is passed from call to call from
memory_dev_init() and there was a lot of logic surrounding, does the
memory block already exist and ifs about whether we were adding it at
boot or the result of hotplug.

Also because we were releasing the mem_sysfs_mutex after each section
add we were having to do a get/put on the dev.kobj every time.  That
isn't even actually done properly right now as the memory block is not
looked up under lock each time a ref is taken.  This hasn't been a
problem since that is only done during boot when memory blocks can't
be concurrently unregistered.  But still.  It speaks to the complexity.

This patch introduces add_memory_block() which helps with the delineation
of sections and blocks and makes the code easier to understand IMHO.

> 
> > diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> > index 2b7813e..392ccd3 100644
> > --- a/drivers/base/memory.c
> > +++ b/drivers/base/memory.c
> > @@ -30,7 +30,7 @@ static DEFINE_MUTEX(mem_sysfs_mutex);
> >  
> >  #define MEMORY_CLASS_NAME	"memory"
> >  
> > -static int sections_per_block;
> > +static int sections_per_block __read_mostly;
> >  
> >  static inline int base_memory_block_id(int section_nr)
> >  {
> > @@ -47,6 +47,9 @@ static struct bus_type memory_subsys = {
> >  	.offline = memory_subsys_offline,
> >  };
> >  
> > +static unsigned long *memblock_present;
> > +static bool largememory_enable __read_mostly;
> 
> How would you see this getting used in practice?  Are you just going to
> set this by default on ppc?  Or, would you ask the distros to put it on
> the command-line by default?  Would it only affect machines larger than
> a certain size?

It would not be on by default, but for people running into the problem
on their large memory machines, we could enable this after verifying
that any tools that operate on the memory block configs are "dynamic
memory block aware"

> 
> This approach breaks the ABI, right?.

ABI... API... it does modify an expectation of current userspace tools.

> An existing tool would not work
> with this patch (plus boot option) since it would not know how to
> show/hide things.  It lets _part_ of those existing tools get reused
> since they only have to be taught how to show/hide things.
> 
> I'd find this really intriguing if you found a way to keep even the old
> tools working.  Instead of having an explicit show/hide, why couldn't
> you just create the entries on open(), for instance?

Nathan and I talked about this and I'm not sure if sysfs would support
such a thing, i.e. memory block creation when someone tried to cd into
the memory block device config.  I wouldn't know where to start on that.

> 
> >  int register_memory_notifier(struct notifier_block *nb)
> > @@ -565,16 +568,13 @@ static const struct attribute_group *memory_memblk_attr_groups[] = {
> >  static
> >  int register_memory(struct memory_block *memory)
> >  {
> > -	int error;
> > -
> >  	memory->dev.bus = &memory_subsys;
> >  	memory->dev.id = memory->start_section_nr / sections_per_block;
> >  	memory->dev.release = memory_block_release;
> >  	memory->dev.groups = memory_memblk_attr_groups;
> >  	memory->dev.offline = memory->state == MEM_OFFLINE;
> >  
> > -	error = device_register(&memory->dev);
> > -	return error;
> > +	return device_register(&memory->dev);
> >  }
> 
> This kind of simplification could surely stand in its own patch.

Yes.

> 
> >  static int init_memory_block(struct memory_block **memory,
> > @@ -582,67 +582,72 @@ static int init_memory_block(struct memory_block **memory,
> >  {
> >  	struct memory_block *mem;
> >  	unsigned long start_pfn;
> > -	int scn_nr;
> > -	int ret = 0;
> > +	int scn_nr, ret, memblock_id;
> >  
> > +	*memory = NULL;
> >  	mem = kzalloc(sizeof(*mem), GFP_KERNEL);
> >  	if (!mem)
> >  		return -ENOMEM;
> >  
> >  	scn_nr = __section_nr(section);
> > +	memblock_id = base_memory_block_id(scn_nr);
> >  	mem->start_section_nr =
> >  			base_memory_block_id(scn_nr) * sections_per_block;
> >  	mem->end_section_nr = mem->start_section_nr + sections_per_block - 1;
> >  	mem->state = state;
> > -	mem->section_count++;
> >  	mutex_init(&mem->state_mutex);
> >  	start_pfn = section_nr_to_pfn(mem->start_section_nr);
> >  	mem->phys_device = arch_get_memory_phys_device(start_pfn);
> >  
> >  	ret = register_memory(mem);
> > +	if (ret) {
> > +		kfree(mem);
> > +		return ret;
> > +	}
> >  
> >  	*memory = mem;
> > -	return ret;
> > +	return 0;
> >  }
> 
> memblock_id doesn't appear to ever get used.  This also appears to
> change the conventions about where the 'memory_block' is allocated and
> freed.  It isn't immediately clear why it needed to be changed.
> 
> Looking at the rest, this _really_ needs to get refactored before it's
> reviewable.

Yes, this doesn't review well in diff format, unfortunately.   I guess
I'll need to break this down into steps that rewrite the same code
blocks from step to step.

> 
> > +static ssize_t memory_present_show(struct device *dev,
> > +				  struct device_attribute *attr, char *buf)
> > +{
> > +	int n_bits, ret;
> > +
> > +	n_bits = NR_MEM_SECTIONS / sections_per_block;
> > +	ret = bitmap_scnlistprintf(buf, PAGE_SIZE - 2,
> > +				memblock_present, n_bits);
> > +	buf[ret++] = '\n';
> > +	buf[ret] = '\0';
> > +
> > +	return ret;
> > +}
> 
> Doesn't this break the one-value-per-file rule?

I didn't know there was such a rule but it might. Is there any
acceptable way to express a ranges of values.  I would just do a
"last_memblock_id" but the range can have holes.

Seth

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]