Re: Comm: bcache_allocato BUG: unable to handle kernel NULL pointer dereference at 00000000000006bc

Michael Lyle <mlyle@xxxxxxxx> · Sat, 14 Oct 2017 15:34:46 -0700

On 10/14/2017 03:02 PM, Coly Li wrote:
[snip]

Hey Coly,--

My bad: I misread the trace and your commit log.  This makes more sense.

Does this re-ordering also prevent the other alloc_thread issue from
being triggered?

I feel like we should maybe be holding a big lock that excludes sysfs
actions during these critical times-- maybe make the register_lock
bigger-- so that it is not as critical to handle these dependencies.  It
will be a little less responsive/concurrent during registration but this
is just a boot-time or shutdown issue, so...

Mike

> Hi Mike,
> 
> I check the code, creating sysfs obj has no obvious logic change in the
> code. And I ran a test for this change, just echo 1 into stop files and
> re-register cache/cached devices by bash script.
> 
> The dmesg output displays,
> [ 5646.739946] bcache: bch_journal_replay() journal replay done, 3406
> keys in 8 entries, seq 15762
> [ 5646.740120] bcache: register_cache() registered cache device nvme1n1p1
> [ 5647.074204] bcache: register_bdev() registered backing device md0
> [ 5647.145510] bcache: bch_cached_dev_attach() Caching md0 as bcache0 on
> set 76abb9d9-aea3-4e1a-9b4a-46a217963834
> [ 5664.285232] bcache: bcache_device_free() bcache0 stopped
> [ 5664.322575] bcache: cache_set_free() Cache set
> 76abb9d9-aea3-4e1a-9b4a-46a217963834 unregistered
> [ 5674.928248] bcache: bch_journal_replay() journal replay done, 4063
> keys in 11 entries, seq 15859
> [ 5674.928413] bcache: register_cache() registered cache device nvme1n1p1
> [ 5675.257664] bcache: register_bdev() registered backing device md0
> [ 5675.347494] bcache: bch_cached_dev_attach() Caching md0 as bcache0 on
> set 76abb9d9-aea3-4e1a-9b4a-46a217963834
> [ 5692.665452] bcache: bcache_device_free() bcache0 stopped
> 
> I ran the test for a while, it seems nothing broken. Then I post out
> this draft version.
> 
>> Isn't this reported issue is the thing that you fixed earlier in
>> "bcache: check ca->alloc_thread initialized before wake up it" ?
>>
> 
> They are different NULL dereference locations, my previous fix is a NULL
> dereference when waking up allocator thread, this one is a NULL
> dereference on data bucket spinlock.
> 
> This NULL spinlock is a little bit complex, I don't find a proper way to
> handle caller's logic where data bucket spinlock is referenced, so I
> change the function calling order to make sure it has been initialized
> before any further reference.
> 
>> It seems like we have a lot of issues with bringing up devices and
>> tearing them down/detaching.
> 
> Yes, if you look at Liang Chen's patch "bcache: explicitly destroy mutex
> while exiting", you may find he moves sysfs_create_files(bcache_kobj,
> files)) after bch_request_init() for the similar reason.
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html