On Wed, 11 Jun 2014 19:00:42 +0300 Alexander Lyakas <alex.bolshoy@xxxxxxxxx> wrote: > Hi Neil, > in your master branch, you have a code like: > > static int grow_stripes(struct r5conf *conf, int num) > { > struct kmem_cache *sc; > int devs = max(conf->raid_disks, conf->previous_raid_disks); > int hash; > > if (conf->mddev->gendisk) > sprintf(conf->cache_name[0], > "raid%d-%s", conf->level, mdname(conf->mddev)); > else > sprintf(conf->cache_name[0], > "raid%d-%p", conf->level, conf->mddev); > sprintf(conf->cache_name[1], "%s-alt", conf->cache_name[0]); > > conf->active_name = 0; > sc = kmem_cache_create(conf->cache_name[conf->active_name], > sizeof(struct stripe_head)+(devs-1)*sizeof(struct r5dev), > 0, 0, NULL); > > In our case what happened was: > - we were assembling two MDs in parallel: md4 and md5 > - each one tried to create its own kmem_cache: raid5-md4 and raid5-md5 > (each one had valid conf->mmdev->gendisk) > > In our kernel SLUB is configured. So the code went to > slub.c::__kmem_cache_create(). It called sysfs_slab_add(), which > eventually tried to do: > > if (unmergeable) { > // not here > } else { > // we went here > name = create_unique_id(s); > } > > For both threads calling this, it created the same unique id: > "t-0001832". And then sysfs freaked out and complained[1]. So md5 was > unlucky and failed to initialize, and md4 got lucky and came up. > Later, we retried md5 assembly and it worked alright. > > In this case, both MDs have the same number of disks. That's why > kernel tried to have a single cache. Problem is that > __kmem_cache_create unlocks slab_mutex, so that's why the race becomes > possible. > > I realize that this is not MD-specific, but rather slab-specific > issue, but do you have any idea how to fix that?:( no, sorry. As the slub developers. NeilBrown
Attachment:
signature.asc
Description: PGP signature