On Fri, Dec 9, 2016 at 7:01 AM, NeilBrown <neilb@xxxxxxxx> wrote: > On Thu, Dec 08 2016, Jinpu Wang wrote: > > > This number: > >> nr_pending = { >> counter = 1 >> }, > > > > and this number: > >> nr_pending = { >> counter = 856 >> }, > > might be interesting. > > There are 855 requested on the list. Add the one that is currently > being retried give 856, which is nr_pending for the device that failed. > But nr_pending on the device that didn't fail is 1. I would expect > zero. > When a read or write requests succeeds, rdev_dec_pending() is called > immediately so this should quickly go to zero. > > It seems as though there must be a request to the loop device that is > stuck somewhere between the atomic_inc(&rdev->nr_pending) (possibly > inside read_balance) and the call to generic_make_request(). > I cannot yet see how that would happen. > > Can you check if the is a repeatable observation? Is nr_pending.counter > always '1' on the loop device? > > Thanks, > NeilBrown Hi Neil, Yes, it's repreatable observation. I triggered again, this time nr_pending = 1203, nr_waiting = 8, nr_queued = 1201, conf->retry_list has 1175 entries. on conf->bio_end_io_list has 26 entries. Totol is 1201, match nr_queued. in md_rdev healthy one loop1 has 1 nr_pending. faulty one ibnbd1 has 1076. crash> struct md_rdev 0xffff880228880400 struct md_rdev { same_set = { next = 0xffff88023202a200, prev = 0xffff8800b64c6018 }, sectors = 2095104, mddev = 0xffff8800b64c6000, last_events = 17764573, meta_bdev = 0x0, bdev = 0xffff8800b60ce080, sb_page = 0xffffea0002bd3040, bb_page = 0xffffea0002dc76c0, sb_loaded = 1, sb_events = 166, data_offset = 2048, new_data_offset = 2048, sb_start = 8, sb_size = 512, preferred_minor = 65535, kobj = { name = 0xffff880037962af0 "dev-ibnbd0", entry = { next = 0xffff880228880480, prev = 0xffff880228880480 }, parent = 0xffff8800b64c6050, kset = 0x0, ktype = 0xffffffffa0501300 <rdev_ktype>, sd = 0xffff88022bfc12d0, kref = { refcount = { counter = 1 } }, state_initialized = 1, state_in_sysfs = 1, state_add_uevent_sent = 0, state_remove_uevent_sent = 0, uevent_suppress = 0 }, flags = 581, blocked_wait = { lock = { { rlock = { raw_lock = { val = { counter = 0 } } } } }, task_list = { next = 0xffff8802288804c8, prev = 0xffff8802288804c8 } }, desc_nr = 0, raid_disk = 0, new_raid_disk = 0, saved_raid_disk = -1, { recovery_offset = 18446744073709551615, journal_tail = 18446744073709551615 }, nr_pending = { counter = 1176 }, read_errors = { counter = 0 }, last_read_error = { tv_sec = 0, tv_nsec = 0 }, corrected_errors = { counter = 0 }, del_work = { data = { counter = 0 }, entry = { next = 0x0, prev = 0x0 }, func = 0x0 }, sysfs_state = 0xffff88022bfc1348, badblocks = { count = 0, unacked_exist = 0, shift = 0, page = 0xffff8802289aa000, changed = 0, lock = { seqcount = { sequence = 0 }, lock = { { rlock = { raw_lock = { val = { counter = 0 } } } } } }, sector = 0, size = 0 } } crash> struct md_rdev 0xffff88023202a200 struct md_rdev { same_set = { next = 0xffff8800b64c6018, prev = 0xffff880228880400 }, sectors = 2095104, mddev = 0xffff8800b64c6000, last_events = 37178561, meta_bdev = 0x0, bdev = 0xffff8800b60d09c0, sb_page = 0xffffea0008af7580, bb_page = 0xffffea0002e69380, sb_loaded = 1, sb_events = 167, data_offset = 2048, new_data_offset = 2048, sb_start = 8, sb_size = 512, preferred_minor = 65535, kobj = { name = 0xffff88023521ec30 "dev-loop1", entry = { next = 0xffff88023202a280, prev = 0xffff88023202a280 }, parent = 0xffff8800b64c6050, kset = 0x0, ktype = 0xffffffffa0501300 <rdev_ktype>, sd = 0xffff88022bc0a708, kref = { refcount = { counter = 1 } }, state_initialized = 1, state_in_sysfs = 1, state_add_uevent_sent = 0, state_remove_uevent_sent = 0, uevent_suppress = 0 }, flags = 2, blocked_wait = { lock = { { rlock = { raw_lock = { val = { counter = 0 } } } } }, task_list = { next = 0xffff88023202a2c8, prev = 0xffff88023202a2c8 } }, crash> struct md_rdev 0xffff88023202a200 struct md_rdev { same_set = { next = 0xffff8800b64c6018, prev = 0xffff880228880400 }, sectors = 2095104, mddev = 0xffff8800b64c6000, last_events = 37178561, meta_bdev = 0x0, bdev = 0xffff8800b60d09c0, sb_page = 0xffffea0008af7580, bb_page = 0xffffea0002e69380, sb_loaded = 1, sb_events = 167, data_offset = 2048, new_data_offset = 2048, sb_start = 8, sb_size = 512, preferred_minor = 65535, kobj = { name = 0xffff88023521ec30 "dev-loop1", entry = { next = 0xffff88023202a280, prev = 0xffff88023202a280 }, parent = 0xffff8800b64c6050, kset = 0x0, ktype = 0xffffffffa0501300 <rdev_ktype>, sd = 0xffff88022bc0a708, kref = { refcount = { counter = 1 } }, state_initialized = 1, state_in_sysfs = 1, state_add_uevent_sent = 0, state_remove_uevent_sent = 0, uevent_suppress = 0 }, flags = 2, blocked_wait = { lock = { { rlock = { raw_lock = { val = { counter = 0 } } } } }, task_list = { next = 0xffff88023202a2c8, prev = 0xffff88023202a2c8 } }, desc_nr = 1, raid_disk = 1, new_raid_disk = 0, saved_raid_disk = -1, { recovery_offset = 18446744073709551615, journal_tail = 18446744073709551615 }, nr_pending = { counter = 1 }, read_errors = { counter = 0 }, last_read_error = { tv_sec = 0, tv_nsec = 0 }, corrected_errors = { counter = 0 }, del_work = { data = { counter = 0 }, entry = { next = 0x0, prev = 0x0 }, func = 0x0 }, sysfs_state = 0xffff88022bc0a780, badblocks = { count = 0, unacked_exist = 0, shift = 0, page = 0xffff88022bff0000, changed = 0, lock = { seqcount = { sequence = 164 }, lock = { { rlock = { raw_lock = { val = { counter = 0 } } } } } }, sector = 0, size = 0 } } Thanks -- Jinpu Wang Linux Kernel Developer ProfitBricks GmbH Greifswalder Str. 207 D - 10405 Berlin Tel: +49 30 577 008 042 Fax: +49 30 577 008 299 Email: jinpu.wang@xxxxxxxxxxxxxxxx URL: https://www.profitbricks.de Sitz der Gesellschaft: Berlin Registergericht: Amtsgericht Charlottenburg, HRB 125506 B Geschäftsführer: Achim Weiss -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html