On 2015/3/4 11:53, Gu Zheng wrote: > Hi Xishi, > > On 03/04/2015 10:22 AM, Xishi Qiu wrote: > >> On 2015/3/3 18:20, Gu Zheng wrote: >> >>> Hi Xishi, >>> On 03/03/2015 11:30 AM, Xishi Qiu wrote: >>> >>>> When hot-remove a numa node, we will clear pgdat, >>>> but is memset 0 safe in try_offline_node()? >>> >>> It is not safe here. In fact, this is a temporary solution here. >>> As you know, pgdat is accessed lock-less now, so protection >>> mechanism (RCU?) is needed to make it completely safe here, >>> but it seems a bit over-kill. >>> >>>> >>>> process A: offline node XX: >>>> for_each_populated_zone() >>>> find online node XX >>>> cond_resched() >>>> offline cpu and memory, then try_offline_node() >>>> node_set_offline(nid), and memset(pgdat, 0, sizeof(*pgdat)) >>>> access node XX's pgdat >>>> NULL pointer access error >>> >>> It's possible, but I did not meet this condition, did you? >>> >> >> Yes, we test hot-add/hot-remove node with stress, and meet the following >> call trace several times. > > Thanks. > >> >> next_online_pgdat() >> int nid = next_online_node(pgdat->node_id); // it's here, pgdat is NULL > > memset(pgdat, 0, sizeof(*pgdat)); > This memset just sets the context of pgdat to 0, but it will not free pgdat, so the *pgdat is > NULL* is strange here. > But anyway, the bug is real, we must fix it. next_zone() pg_data_t *pgdat = zone->zone_pgdat; // I think this pgdat is NULL, and NODE_DATA() is not NULL. ... pgdat = next_online_pgdat(pgdat); int nid = next_online_node(pgdat->node_id); // so here is the null pointer access Thanks for your new patch, I'll test it. Thanks, Xishi Qiu -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>