(2013/01/10 16:55), Glauber Costa wrote:
On 01/10/2013 11:31 AM, Kamezawa Hiroyuki wrote:(2013/01/10 16:14), Glauber Costa wrote:On 01/10/2013 06:17 AM, Tang Chen wrote:Note: if the memory provided by the memory device is used by the kernel, it can't be offlined. It is not a bug.Right. But how often does this happen in testing? In other words, please provide an overall description of how well memory hot-remove is presently operating. Is it reliable? What is the success rate in real-world situations?We test the hot-remove functionality mostly with movable_online used. And the memory used by kernel is not allowed to be removed.Can you try doing this using cpusets configured to hardwall ? It is my understanding that the object allocators will try hard not to allocate anything outside the walls defined by cpuset. Which means that if you have one process per node, and they are hardwalled, your kernel memory will be spread evenly among the machine. With a big enough load, they should eventually be present in all blocks.I'm sorry I couldn't catch your point. Do you want to confirm whether cpuset can work enough instead of ZONE_MOVABLE ? Or Do you want to confirm whether ZONE_MOVABLE will not work if it's used with cpuset ?No, I am not proposing to use cpuset do tackle the problem. I am just wondering if you would still have high success rates with cpusets in use with hardwalls. This is just one example of a workload that would spread kernel memory around quite heavily. So this is just me trying to understand the limitations of the mechanism.
Hm, okay. In my undestanding, if the whole memory of a node is configured as MOVABLE, no kernel memory will not be allocated in the node because zonelist will not match. So, if cpuset is used with hardwalls, user will see -ENOMEM or OOM, I guess. even fork() will fail if fallback-to-other-node is not allowed. If it's configure as ZONE_NORMAL, you need to pray for offlining memory. AFAIK, IBM's ppc? has 16MB section size. So, some of sections can be offlined even if they are configured as ZONE_NORMAL. For them, placement of offlined memory is not important because it's virtualized by LPAR, they don't try to remove DIMM, they just want to increase/decrease amount of memory. It's an another approach. But here, we(fujitsu) tries to remove a system board/DIMM. So, configuring the whole memory of a node as ZONE_MOVABLE and tries to guarantee DIMM as removable.
IMHO, I don't think shrink_slab() can kill all objects in a node even if they are some caches. We need more study for doing that.Indeed, shrink_slab can only kill cached objects. They, however, are usually a very big part of kernel memory. I wonder though if in case of failure, it is worth it to try at least one shrink pass before you give up.
Yeah, now, his (our) approach is never allowing kernel memory on a node to be hot-removed by ZONE_MOVABLE. So, shrink_slab()'s effect will not be seen. If other brave guys tries to use ZONE_NORMAL for hot-pluggable DIMM, I see, it's worth triying. How about checking the target memsection is in NORMAL or in MOVABLE at hot-removing ? If NORMAL, shrink_slab() will be worth to be called. BTW, shrink_slab() is now node/zone aware ? If not, fixing that first will be better direction I guess. Thanks, -Kame -- To unsubscribe from this list: send the line "unsubscribe linux-ia64" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html