Hello Dave,
I applied this patch manually to 3.8.13, but still hitting exact same issue
with my reproduction.
I still have xlog_recover_iodone callbacks delivered through
xfs_buf_bio_end_io scheduling it through xfslogd_workqueue.
Some of the callbacks arrive before mount completes:
Aug 6 17:31:33 dev kernel: [ 3258.774970] XFS (dm-19): metadata I/O error:
block 0x3780001 ("xlog_recover_iodone") error 28 numblks 1
Aug 6 17:31:33 dev kernel: [ 3258.776687] XFS (dm-19):
xfs_do_force_shutdown(0x1) called from line 377 of file
/mnt/work/alex/xfs/fs/xfs/xfs_log_recover.c. Return address =
0xffffffffa0743848
Aug 6 17:31:33 dev kernel: [ 3258.777262] XFS (dm-19): I/O Error Detected.
Shutting down filesystem
Aug 6 17:31:33 dev kernel: [ 3258.778369] XFS (dm-19): Please umount the
filesystem and rectify the problem(s)
Aug 6 17:31:33 dev kernel: [ 3258.779634] XFS (dm-19): metadata I/O error:
block 0x3780002 ("xlog_recover_iodone") error 28 numblks 1
Aug 6 17:31:33 dev kernel: [ 3258.781929] XFS (dm-19):
xfs_do_force_shutdown(0x1) called from line 377 of file
/mnt/work/alex/xfs/fs/xfs/xfs_log_recover.c. Return address =
0xffffffffa0743848
Aug 6 17:31:33 dev kernel: [ 3258.781939] XFS (dm-19): metadata I/O error:
block 0x3780008 ("xlog_recover_iodone") error 28 numblks 8
Aug 6 17:31:33 dev kernel: [ 3258.784235] XFS (dm-19):
xfs_do_force_shutdown(0x1) called from line 377 of file
/mnt/work/alex/xfs/fs/xfs/xfs_log_recover.c. Return address =
0xffffffffa0743848
Aug 6 17:31:33 dev kernel: [ 3258.784260] XFS (dm-19): metadata I/O error:
block 0x3780010 ("xlog_recover_iodone") error 28 numblks 8
Aug 6 17:31:33 dev kernel: [ 3258.784389] XFS (dm-19): log mount/recovery
failed: error 28
Aug 6 17:31:33 dev kernel: [ 3258.784549] XFS (dm-19): log mount failed
And some arrive afterwards, and print garbage.
Aug 6 17:31:33 dev kernel: [ 3258.786398] XFS (ˆm&_):
xfs_do_force_shutdown(0x1) called from line 377 of file
/mnt/work/alex/xfs/fs/xfs/xfs_log_recover.c. Return address =
0xffffffffa0743848
Aug 6 17:31:33 dev kernel: [ 3258.786404] XFS (ˆm&_): metadata I/O error:
block 0x3780018 ("xlog_recover_iodone") error 28 numblks 8
Aug 6 17:31:33 dev kernel: [ 3258.788575] XFS (ˆm&_):
xfs_do_force_shutdown(0x1) called from line 377 of file
/mnt/work/alex/xfs/fs/xfs/xfs_log_recover.c. Return address =
0xffffffffa0743848
Aug 6 17:31:33 dev kernel: [ 3258.788614] XFS (ˆm&_): metadata I/O error:
block 0x37800a0 ("xlog_recover_iodone") error 28 numblks 16
Aug 6 17:31:33 dev kernel: [ 3258.790849] XFS (ˆm&_):
xfs_do_force_shutdown(0x1) called from line 377 of file
/mnt/work/alex/xfs/fs/xfs/xfs_log_recover.c. Return address =
0xffffffffa0743848
Maybe some additional patch is needed for my kernel?
Thanks,
Alex.
-----Original Message-----
From: Dave Chinner
Sent: 06 August, 2014 3:32 PM
To: Alex Lyakas
Cc: xfs@xxxxxxxxxxx
Subject: Re: use-after-free on log replay failure
On Wed, Aug 06, 2014 at 01:05:34PM +0300, Alex Lyakas wrote:
Hi Dave,
On Tue, Aug 5, 2014 at 2:07 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> On Mon, Aug 04, 2014 at 02:00:05PM +0300, Alex Lyakas wrote:
>> Greetings,
>>
>> we had a log replay failure due to some errors that the underlying
>> block device returned:
>> [49133.801406] XFS (dm-95): metadata I/O error: block 0x270e8c180
>> ("xlog_recover_iodone") error 28 numblks 16
>> [49133.802495] XFS (dm-95): log mount/recovery failed: error 28
>> [49133.802644] XFS (dm-95): log mount failed
>
> #define ENOSPC 28 /* No space left on device */
>
> You're getting an ENOSPC as a metadata IO error during log recovery?
> Thin provisioning problem, perhaps,
Yes, it is a thin provisioning problem (which I already know the cause
for).
> and the error is occurring on
> submission rather than completion? If so:
>
> 8d6c121 xfs: fix buffer use after free on IO error
I am not sure what do you mean by "submission rather than completion".
Do you mean that xfs_buf_ioapply_map() returns without submitting any
bios?
No, that the bio submission results in immediate failure (e.g. the
device goes away, so submission results in ENODEV). Hence when
_xfs_buf_ioapply() releases it's IO reference itis the only
remaining reference to the buffer and so completion processing is
run immediately. i.e. inline from the submission path.
Normally IO errors are reported through the bio in IO completion
interrupt context. i.e the IO is completed by the hardware and the
error status is attached to bio, which is then completed and we get
into XFS that way. The IO submision context is long gone at this
point....
In that case, no, bios are submitted to the block device, and it
fails them through a different context with ENOSPC error. I will still
try the patch you mentioned, because it also looks relevant to another
question I addressed to you earlier in:
http://oss.sgi.com/archives/xfs/2013-11/msg00648.html
No, that's a different problem.
9c23ecc xfs: unmount does not wait for shutdown during unmount
Cheers,
Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs