RE: [PATCH] md: Do not unlock all_mddevs_lock in md_seq_show()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello, 

> Hi,
> 
> 在 2023/12/07 16:58, Yuya Fujita (Fujitsu) 写道:
> > Hello,
> >
> >> I fail to understand what is the problem here if other mddev is deleted
> >> from the list while all_mddevs_lock is released during md_seq_show(),
> >> can you explain more?
> >
> > If the list is deleted in md_seq_show(), it is assumed to reference an invalid
> > pointer when traversing the next list.
> > While all_mddevs_lock is being released, mddev->lock is being retrieved,
> > but I am not sure if the list will not be deleted during that time.
> 
> No, this is not true, md_seq_show() won't delete mddev from the list,
> the next list is still valid.
> 
> md_seq_show:
>   mddev_get
>   spin_unlock(&all_mddevs_lock)
>   // mddev won't be removed untill __mddev_put
>   spin_lock(&all_mddevs_lock)
>   if (atomic_dec_and_test(&mddev->active))
>    __mddev_put(mddev)
>    set_bit(MD_DELETED, &mddev->flags)
>    queue_work(md_misc_wq, &mddev->del_work)
> md_seq_next
> /*
>   * all_mddevs_lock is not released since last __mddev_put, it's safe
>   * to iterate the next list.
>   */
> 
> Later from del_work:
> mddev_delayed_delete
>   kobject_put(&mddev->kobj)
>    md_kobj_release
>     del_gendisk
>      md_free_disk
>       spin_lock(&all_mddevs_lock)
>       list_del(&mddev->all_mddevs)
>       spin_unlock(&all_mddevs_lock)

You're right... Thank you for the explanation.

> 
> 
> >
> > The following is an example of a kernel module that deletes a list during .show().
> > Initially, three entries are added to the test_list and the list will be
> > deleted in test_seq_show().
> > ---
> > #include <linux/module.h>
> > #include <linux/kernel.h>
> > #include <linux/seq_file.h>
> > #include <linux/proc_fs.h>
> >
> > MODULE_LICENSE("GPL v2");
> > MODULE_DESCRIPTION("an example of deleting list in .show()");
> >
> > static LIST_HEAD(test_list);
> >
> > struct test_entry {
> > 	struct list_head list;
> > 	int n;
> > };
> >
> > static void test_add(int n) {
> > 	struct test_entry *e = kmalloc(sizeof(*e), GFP_KERNEL);
> > 	e->n = n;
> > 	list_add(&e->list, &test_list);
> > 	printk(KERN_INFO "add %d to test_list\n", n);
> > }
> >
> > static void *test_seq_start(struct seq_file *seq, loff_t *pos) {
> > 	printk(KERN_INFO "seq_list_start() is called.\n");
> > 	return seq_list_start(&test_list, *pos);
> > }
> >
> > static void *test_seq_next(struct seq_file *seq, void *v, loff_t *pos) {
> > 	struct test_entry *e = list_entry(v, struct test_entry, list);
> > 	printk(KERN_INFO "seq_list_next() is called on list %d\n", e->n );
> > 	return seq_list_next(v, &test_list, pos);
> > }
> >
> > static void test_seq_stop(struct seq_file *seq, void *v) {
> > }
> >
> > static int test_seq_show(struct seq_file *seq, void *v){
> > 	struct test_entry *e = list_entry(v, struct test_entry, list);
> > 	printk(KERN_INFO "test_seq_show() is called on list %d\n", e->n );
> > 	list_del(&e->list);
> 
> As explained before, this list_del() will never happen in md_seq_show().
> 
> Thanks,
> Kuai
> 
> > 	printk(KERN_INFO "deleted list %d\n", e->n );
> > 	return 0;
> > }
> >
> > static const struct seq_operations test_seq_ops = {
> > 	.start  = test_seq_start,
> > 	.next   = test_seq_next,
> > 	.stop   = test_seq_stop,
> > 	.show   = test_seq_show,
> > };
> >
> > int test_seq_open(struct inode *inode, struct file *file)
> > {
> > 	struct seq_file *seq;
> > 	int error;
> >
> > 	error = seq_open(file, &test_seq_ops);
> > 	if (error)
> > 		return error;
> >
> > 	seq = file->private_data;
> > 	return error;
> > }
> >
> > static const struct proc_ops test_proc_ops = {
> > 	.proc_open	= test_seq_open,
> > 	.proc_read	= seq_read,
> > 	.proc_lseek	= seq_lseek,
> > 	.proc_release	= seq_release,
> > };
> >
> > static int __init test_init( void ) {
> > 	printk( KERN_INFO "insmod test\n" );
> > 	test_add(1);
> > 	test_add(2);
> > 	test_add(3);
> > 	proc_create("test", S_IRUGO, NULL, &test_proc_ops);
> > 	return 0;
> > }
> >
> > static void __exit test_exit( void ) {
> > 	printk( KERN_INFO "rmmod test\n" );
> > }
> >
> > module_init( test_init );
> > module_exit( test_exit );
> > ---
> >
> >
> > Loading the kernel module and executing "cat /proc/test" results in the following message:
> > ---
> > Dec 07 16:23:42 localhost-live kernel: insmod test
> > Dec 07 16:23:42 localhost-live kernel: add 1 to test_list
> > Dec 07 16:23:42 localhost-live kernel: add 2 to test_list
> > Dec 07 16:23:42 localhost-live kernel: add 3 to test_list
> > Dec 07 16:23:43 localhost-live kernel: seq_list_start() is called.
> > Dec 07 16:23:43 localhost-live kernel: test_seq_show() is called on list 3
> > Dec 07 16:23:43 localhost-live kernel: deleted list 3
> > Dec 07 16:23:43 localhost-live kernel: seq_list_next() is called on list 3
> > Dec 07 16:23:43 localhost-live kernel: general protection fault, probably for non-canonical address
> 0xdead000000000110: 0000 [#1] PREEMPT SMP NOPTI
> > Dec 07 16:23:43 localhost-live kernel: CPU: 4 PID: 2511 Comm: cat Tainted: G           OE
> 6.7.0-0.rc4.20231205gtbee0e776.335.vanilla.fc39.x86_64 #1
> > Dec 07 16:23:43 localhost-live kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> 1.16.2-1.fc38 04/01/2014
> > Dec 07 16:23:43 localhost-live kernel: RIP: 0010:test_seq_show+0xd/0x70 [main]
> > Dec 07 16:23:43 localhost-live kernel: Code: e9 d8 2d b5 e4 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90
> 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 53 48 89 f3 <8b> 76 10 48 c7 c7 a0 e0 96 c0 e8 d4 42 85
> e4 48 89 df e8 ec 35 ee
> > Dec 07 16:23:43 localhost-live kernel: RSP: 0018:ffffaa7b43377cf8 EFLAGS: 00010287
> > Dec 07 16:23:43 localhost-live kernel: RAX: ffffffffc0954060 RBX: dead000000000100 RCX:
> 0000000000000027
> > Dec 07 16:23:43 localhost-live kernel: RDX: 0000000000000000 RSI: dead000000000100 RDI:
> ffff9e0ac15b87f8
> > Dec 07 16:23:43 localhost-live kernel: RBP: 0000000000000000 R08: 0000000000000000 R09:
> ffffaa7b43377b90
> > Dec 07 16:23:43 localhost-live kernel: R10: 0000000000000003 R11: ffffffffa75463a8 R12:
> ffffaa7b43377d98
> > Dec 07 16:23:43 localhost-live kernel: R13: ffffaa7b43377d70 R14: dead000000000100 R15:
> 0000000000000000
> > Dec 07 16:23:43 localhost-live kernel: FS:  00007fb3ac908740(0000) GS:ffff9e0c31d00000(0000)
> knlGS:0000000000000000
> > Dec 07 16:23:43 localhost-live kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > Dec 07 16:23:43 localhost-live kernel: CR2: 00007fb3ac8e6000 CR3: 0000000130c0c005 CR4:
> 0000000000770ef0
> > Dec 07 16:23:43 localhost-live kernel: PKRU: 55555554
> > Dec 07 16:23:43 localhost-live kernel: Call Trace:
> > Dec 07 16:23:43 localhost-live kernel:  <TASK>
> > Dec 07 16:23:43 localhost-live kernel:  ? die_addr+0x36/0x90
> > Dec 07 16:23:43 localhost-live kernel:  ? exc_general_protection+0x1c5/0x430
> > Dec 07 16:23:43 localhost-live kernel:  ? asm_exc_general_protection+0x26/0x30
> > Dec 07 16:23:43 localhost-live kernel:  ? __pfx_test_seq_show+0x10/0x10 [main]
> > Dec 07 16:23:43 localhost-live kernel:  ? test_seq_show+0xd/0x70 [main]
> > Dec 07 16:23:43 localhost-live kernel:  seq_read_iter+0x120/0x480
> > Dec 07 16:23:43 localhost-live kernel:  seq_read+0xd4/0x100
> > Dec 07 16:23:43 localhost-live kernel:  proc_reg_read+0x5a/0xa0
> > Dec 07 16:23:43 localhost-live kernel:  vfs_read+0xac/0x320
> > Dec 07 16:23:43 localhost-live kernel:  ksys_read+0x6f/0xf0
> > Dec 07 16:23:43 localhost-live kernel:  do_syscall_64+0x61/0xe0
> > Dec 07 16:23:43 localhost-live kernel:  ? do_user_addr_fault+0x30f/0x660
> > Dec 07 16:23:43 localhost-live kernel:  ? exc_page_fault+0x7f/0x180
> > Dec 07 16:23:43 localhost-live kernel:  entry_SYSCALL_64_after_hwframe+0x6e/0x76
> > Dec 07 16:23:43 localhost-live kernel: RIP: 0033:0x7fb3aca13121
> > Dec 07 16:23:43 localhost-live kernel: Code: 00 48 8b 15 11 fd 0c 00 f7 d8 64 89 02 b8 ff ff ff ff eb bd e8
> 40 ce 01 00 f3 0f 1e fa 80 3d 45 82 0d 00 00 74 13 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 4f c3 66 0f 1f 44 00 00
> 55 48 89 e5 48 83 ec
> > Dec 07 16:23:43 localhost-live kernel: RSP: 002b:00007ffe7d716738 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000000
> > Dec 07 16:23:43 localhost-live kernel: RAX: ffffffffffffffda RBX: 0000000000020000 RCX:
> 00007fb3aca13121
> > Dec 07 16:23:43 localhost-live kernel: RDX: 0000000000020000 RSI: 00007fb3ac8e7000 RDI:
> 0000000000000003
> > Dec 07 16:23:43 localhost-live kernel: RBP: 00007ffe7d716760 R08: 00000000ffffffff R09:
> 0000000000000000
> > Dec 07 16:23:43 localhost-live kernel: R10: 0000000000000022 R11: 0000000000000246 R12:
> 0000000000020000
> > Dec 07 16:23:43 localhost-live kernel: R13: 00007fb3ac8e7000 R14: 0000000000000003 R15:
> 0000000000000000
> > Dec 07 16:23:43 localhost-live kernel:  </TASK>
> > Dec 07 16:23:43 localhost-live kernel: Modules linked in: main(OE) uinput snd_seq_dummy
> snd_hrtimer nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6
> nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack
> nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables nfnetlink qrtr sunrpc binfmt_misc
> snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi
> snd_hda_codec intel_rapl_msr intel_rapl_common snd_hda_core snd_hwdep kvm_intel snd_seq
> iTCO_wdt intel_pmc_bxt iTCO_vendor_support snd_seq_device kvm raid1 snd_pcm irqbypass rapl
> pktcdvd snd_timer pcspkr i2c_i801 i2c_smbus snd lpc_ich soundcore virtio_balloon joydev loop zram
> crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel
> sha512_ssse3 sha256_ssse3 sha1_ssse3 virtio_scsi virtio_gpu virtio_net virtio_blk virtio_console
> virtio_dma_buf net_failover failover serio_raw ip6_tables ip_tables fuse qemu_fw_cfg
> > Dec 07 16:23:43 localhost-live kernel: ---[ end trace 0000000000000000 ]---
> > Dec 07 16:23:43 localhost-live kernel: RIP: 0010:test_seq_show+0xd/0x70 [main]
> > Dec 07 16:23:43 localhost-live kernel: Code: e9 d8 2d b5 e4 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90
> 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 53 48 89 f3 <8b> 76 10 48 c7 c7 a0 e0 96 c0 e8 d4 42 85
> e4 48 89 df e8 ec 35 ee
> > Dec 07 16:23:43 localhost-live kernel: RSP: 0018:ffffaa7b43377cf8 EFLAGS: 00010287
> > Dec 07 16:23:43 localhost-live kernel: RAX: ffffffffc0954060 RBX: dead000000000100 RCX:
> 0000000000000027
> > Dec 07 16:23:43 localhost-live kernel: RDX: 0000000000000000 RSI: dead000000000100 RDI:
> ffff9e0ac15b87f8
> > Dec 07 16:23:43 localhost-live kernel: RBP: 0000000000000000 R08: 0000000000000000 R09:
> ffffaa7b43377b90
> > Dec 07 16:23:43 localhost-live kernel: R10: 0000000000000003 R11: ffffffffa75463a8 R12:
> ffffaa7b43377d98
> > Dec 07 16:23:43 localhost-live kernel: R13: ffffaa7b43377d70 R14: dead000000000100 R15:
> 0000000000000000
> > Dec 07 16:23:43 localhost-live kernel: FS:  00007fb3ac908740(0000) GS:ffff9e0c31d00000(0000)
> knlGS:0000000000000000
> > Dec 07 16:23:43 localhost-live kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > Dec 07 16:23:43 localhost-live kernel: CR2: 00007fb3ac8e6000 CR3: 0000000130c0c005 CR4:
> 0000000000770ef0
> > Dec 07 16:23:43 localhost-live kernel: PKRU: 55555554
> > ---
> >
> >
> > I think this happens in the following way.
> > ---
> > seq_read_iter()
> > ->m->op->show()
> >    ->test_seq_show()
> >      ->list_del(&e->list);
> >        //the list is deleted here
> > ->m->op->next()
> >    ->test_seq_next()
> >      ->seq_list_next()
> >        ->lh = ((struct list_head *)v)->next;
> >          // The list has been deleted, so ->next points to an invalid pointer
> > ->m->op->show()
> >    ->test_seq_show()
> >      // .show() is called again and references the invalid pointer.
> > ---
> >
> > Therefore, I think the same problem will happen when the list is deleted in md_seq_show().
> >
> >
> > Best Regards,
> > Yuya Fujita
> > .
> >





[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux