On Sat, 14 Apr 2018, Coly Li wrote: > On 2018/4/14 6:46 AM, Eric Wheeler wrote: > > Hello all, > > > > We are running bcache in 4.1.49 with both the cache and backing device > > having 4k blocks. The disk stack is DRBD->dm-thin->bcache->[sdc->sdb] > > Where sdc is the cache. > > > > Sometimes we get errors like the following: > > > > [432015.934869] block drbd8065: Began resync as SyncTarget (will sync 880 KB [220 bits set]). > > [432015.949469] sd 0:0:0:1: [sdb] Unaligned block number requested: sector_size=4096, block=15724561783, blk_rq=9 > > [432015.950347] sd 0:0:0:2: [sdc] Unaligned block number requested: sector_size=4096, block=353041040, blk_rq=7 > > [432015.951146] bcache: bch_count_io_errors() dm-6: IO error on reading from cache, recovering > > [432015.952015] block drbd8065: read: error=-5 s=19281488s > > [432015.952866] block drbd8065: Local IO failed in drbd_endio_read_sec_final. > > [432015.953777] sd 0:0:0:2: [sdc] Unaligned block number requested: sector_size=4096, block=387084784, blk_rq=7 > > [432015.954710] bcache: bch_count_io_errors() dm-6: IO error on reading from cache, recovering > > [432015.959037] sd 0:0:0:1: [sdb] Unaligned block number requested: sector_size=4096, block=15725385535, blk_rq=1 > > [432015.959938] block drbd8065: read: error=-5 s=19391384s > > [432015.960862] block drbd8065: Local IO failed in drbd_endio_read_sec_final. > > > > > > Note that 15724561783 is not divisible by 8, thus it is unaligned to 4k > > blocks. > > > > Does anyone know if the bcache code is enforcing correct alignment? > > > > Is there any way that bcache could introduce misalignment? > > > > We ran blockdev --getbsz and --getpbsz all the way down the stack and > > everything reports 4k. > > Hi Eric, > > Do you use 4.1 stable tree, or with your extra patches ? It would be > helpful if I may access your kernel tree. > > So far I cannot tell where the problem is, I just feel there might be > some hidden issue triggered by 4KB sector size hard drive. Maybe adding > a garden code to detect unaligned I/O request from bcache will be > helpful to diagnose the root cause. Hi Coly, I just pushed the branch that we built for this system to bitbucket: https://bitbucket.org/ewheelerinc/linux/branch/ewi-4.1.49-rpmbuild There are a few changes, but probably nothing that you haven't seen: 1. We use the BFQ scheduler (but the problem presents in CFQ also) 2. dmthin fixes were backported from 4.2/4.3 to fix pool space issues on rolling snapshots 3. bcache and dmcrypt have my ioprio patches, but no dmcrypt in this bug 4. ibrs patches from Oracle are included for Spectre mitigation 5. We attempted to use Mauricio's patch to fix 4k alignment issues, which shows in our tree as 4a595ccc, but it did not fix the issue. 6. We updated the error strings in sd.c with our commits aa372d91 and f603bf7e while troubleshooting this issue. Thank you for your help, let me know if there is anything else that you need to help troubleshoot. I can produce this pretty reliably at the moment. -- Eric Wheeler > > Thanks. > > Coly Li > -- To unsubscribe from this list: send the line "unsubscribe linux-bcache" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html