On Mon, 2017-07-31 at 23:15 +0000, Kani, Toshimitsu wrote: > On Wed, 2017-07-26 at 17:35 -0600, Vishal Verma wrote: > : > > > > Clearing errors or badblocks during a BTT write requires sending an > > ACPI DSM, which means potentially sleeping. Since a BTT IO happens > > in > > atomic context (preemption disabled, spinlocks may be held), we > > cannot perform error clearing in the course of an IO. Due to this > > error clearing for BTT IOs has hitherto been disabled. > > > > This series fixes these problems by moving the error clearing out of > > the atomic sections in the BTT. > > > > Also fix a potential deadlock that can occur while clearing errors > > from either BTT or pmem due to memory allocations in the IO path. > > Hi Vishal, > > I just tested the series (sorry for the delay). It works nicely when > doing I/Os to a block device directly. But I am seeing a lot of write > errors with filesystem. > > Here is what I did for the testing. > > 1. 'mkfs.ext /dev/pmem0s' and 'mount /dev/pmem0s /mnt/pmem0s'. > 2. Inject an error to somewhere in the pmem0s device, but not in the > metadata area at beginning. > 3. Run the following script. > === > DEV=pmem0s > set -x > dd if=/dev/zero of=/mnt/$DEV/1Gfile bs=1M count=1024 > while true; do > cp /mnt/$DEV/1Gfile /mnt/$DEV/file-1 > cp /mnt/$DEV/1Gfile /mnt/$DEV/file-2 > cp /mnt/$DEV/1Gfile /mnt/$DEV/file-3 > cp /mnt/$DEV/1Gfile /mnt/$DEV/file-4 > cp /mnt/$DEV/1Gfile /mnt/$DEV/file-5 > cp /mnt/$DEV/1Gfile /mnt/$DEV/file-6 > cp /mnt/$DEV/1Gfile /mnt/$DEV/file-7 > cp /mnt/$DEV/1Gfile /mnt/$DEV/file-8 > cp /mnt/$DEV/1Gfile /mnt/$DEV/file-9 > cp /mnt/$DEV/1Gfile /mnt/$DEV/file-10 > done > === > > Step 3 clears an error and runs fine with raw and memory modes. With > sector mode, however, it ends up with continuous write errors like > below and does not clear the error. Do you have any thoughts? > > EXT4-fs warning (device pmem0s): ext4_end_bio:322: I/O error 10 > writing to inode 17 (offset 1023410176 size 8388608 starting block > 1834752) > Buffer I/O error on device pmem0s, logical block 1834752 > Buffer I/O error on device pmem0s, logical block 1834753 > Buffer I/O error on device pmem0s, logical block 1834754 > : > nd_pmem btt0.0: io error in WRITE sector 14680064, len 4096, > EXT4-fs warning (device pmem0s): ext4_end_bio:322: I/O error 10 > writing to inode 17 (offset 1031798784 size 1052672 starting block > 1835008) > nd_pmem btt0.0: io error in WRITE sector 14682112, len 4096, > EXT4-fs warning (device pmem0s): ext4_end_bio:322: I/O error 10 > writing to inode 17 (offset 1031798784 size 2101248 starting block > 1835264) > : > nd_pmem btt0.0: io error in WRITE sector 14698496, len 4096, > nd_pmem btt0.0: io error in WRITE sector 14700544, len 4096, > nd_pmem btt0.0: io error in WRITE sector 14702592, len 4096, > nd_pmem btt0.0: io error in WRITE sector 14704640, len 4096, > : Thanks for the test Toshi, I will try and reproduce it. My first guess is - are the injected errors potentially in the BTT metadata area towards the end? ->rw_bytes can only clear errors on properly aligned writes, and the btt metadata writes will be too small to clear metadata errors.. > > Thanks, > -Toshi��.n��������+%������w��{.n�����{�����ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f