On 10/14/2017 03:02 PM, Coly Li wrote: [snip] Hey Coly,-- My bad: I misread the trace and your commit log. This makes more sense. Does this re-ordering also prevent the other alloc_thread issue from being triggered? I feel like we should maybe be holding a big lock that excludes sysfs actions during these critical times-- maybe make the register_lock bigger-- so that it is not as critical to handle these dependencies. It will be a little less responsive/concurrent during registration but this is just a boot-time or shutdown issue, so... Mike > Hi Mike, > > I check the code, creating sysfs obj has no obvious logic change in the > code. And I ran a test for this change, just echo 1 into stop files and > re-register cache/cached devices by bash script. > > The dmesg output displays, > [ 5646.739946] bcache: bch_journal_replay() journal replay done, 3406 > keys in 8 entries, seq 15762 > [ 5646.740120] bcache: register_cache() registered cache device nvme1n1p1 > [ 5647.074204] bcache: register_bdev() registered backing device md0 > [ 5647.145510] bcache: bch_cached_dev_attach() Caching md0 as bcache0 on > set 76abb9d9-aea3-4e1a-9b4a-46a217963834 > [ 5664.285232] bcache: bcache_device_free() bcache0 stopped > [ 5664.322575] bcache: cache_set_free() Cache set > 76abb9d9-aea3-4e1a-9b4a-46a217963834 unregistered > [ 5674.928248] bcache: bch_journal_replay() journal replay done, 4063 > keys in 11 entries, seq 15859 > [ 5674.928413] bcache: register_cache() registered cache device nvme1n1p1 > [ 5675.257664] bcache: register_bdev() registered backing device md0 > [ 5675.347494] bcache: bch_cached_dev_attach() Caching md0 as bcache0 on > set 76abb9d9-aea3-4e1a-9b4a-46a217963834 > [ 5692.665452] bcache: bcache_device_free() bcache0 stopped > > I ran the test for a while, it seems nothing broken. Then I post out > this draft version. > >> Isn't this reported issue is the thing that you fixed earlier in >> "bcache: check ca->alloc_thread initialized before wake up it" ? >> > > They are different NULL dereference locations, my previous fix is a NULL > dereference when waking up allocator thread, this one is a NULL > dereference on data bucket spinlock. > > This NULL spinlock is a little bit complex, I don't find a proper way to > handle caller's logic where data bucket spinlock is referenced, so I > change the function calling order to make sure it has been initialized > before any further reference. > >> It seems like we have a lot of issues with bringing up devices and >> tearing them down/detaching. > > Yes, if you look at Liang Chen's patch "bcache: explicitly destroy mutex > while exiting", you may find he moves sysfs_create_files(bcache_kobj, > files)) after bch_request_init() for the similar reason. > -- To unsubscribe from this list: send the line "unsubscribe linux-bcache" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html