Re: EXT4 Oops (Re: [PATCH V15 06/22] mmc: block: Add blk-mq support)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01.03.2018 23:20, Andreas Dilger wrote:
> 
> On Mar 1, 2018, at 9:04 AM, Theodore Ts'o <tytso@xxxxxxx> wrote:
>> This doesn't seem to make sense; the PC is where we are currently
>> executing, and LR is the "Link Register" where the flow of control
>> will be returning after the current function returns, right?  Well,
>> dx_probe should *not* be returning to __wait_on_bit().  So this just
>> seems.... weird.
>>
>> Ignoring the LR register, this stack trace looks sane...  I can't see
>> which pointer could be NULL and getting dereferenced, though.  How
>> easily can you reproduce the problem?  Can you either (a) translate
>> the PC into a line number, or better yet, if you can reproduce, add a
>> series of BUG_ON's so we can see what's going on?

Ted, thank you for the suggestion. I don't have a bug-reproducer, it happens
only under some IO load and quite randomly. I've applied the BUG_ON()'s, but it
may take some time to catch the bug again.

>> +	BUG_ON(frame);
> 
> I think you mean:
> 	BUG_ON(frame == NULL);
> or
> 	BUG_ON(!frame);
> 
> 
>> 	memset(frame_in, 0, EXT4_HTREE_LEVEL * sizeof(frame_in[0]));
>> 	frame->bh = ext4_read_dirblock(dir, 0, INDEX);
>> 	if (IS_ERR(frame->bh))
>> 		return (struct dx_frame *) frame->bh;
>>
>> +	BUG_ON(frame->bh);
>> +	BUG_ON(frame->bh->b_data);
> 
> Same here.
> 
> 	BUG_ON(frame->bh == NULL);
> 	BUG_ON(frame->bh->b_data == NULL);
> 
> This is why I don't like implicit "is NULL" or "is non-zero" usage.  Lustre
> used to require "== NULL" or "!= NULL" to avoid bugs like this, but had to
> abandon that because of upstream code style.

Well spotted, thanks Andreas.

>> 	root = (struct dx_root *) frame->bh->b_data;
>> 	if (root->info.hash_version != DX_HASH_TEA &&
>> 	    root->info.hash_version != DX_HASH_HALF_MD4 &&
>> 	    root->info.hash_version != DX_HASH_LEGACY) {
>>
>> These are "could never" happen scenarios from looking at the code, but
>> that will help explain what is going on.
>>
>> If this is reliably only happening with mq, the only way I could see
>> that if is something is returning an error when it previously wasn't.
>> This isn't a problem we're seeing with any of our testing, though.




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux