On Tue, Mar 15, 2022 at 5:11 PM Ryusuke Konishi <konishi.ryusuke@xxxxxxxxx> wrote: > > Hi Dongliang, > > On Tue, Mar 15, 2022 at 2:50 PM Dongliang Mu <mudongliangabcd@xxxxxxxxx> wrote: > > > > On Tue, Mar 15, 2022 at 12:46 PM Ryusuke Konishi > > <konishi.ryusuke@xxxxxxxxx> wrote: > > > > > > On Tue, Mar 15, 2022 at 10:59 AM Dongliang Mu <mudongliangabcd@xxxxxxxxx> wrote: > > > > > > > > On Sun, Mar 13, 2022 at 9:35 PM Dongliang Mu <mudongliangabcd@xxxxxxxxx> wrote: > > > > > > > > > > On Sun, Mar 13, 2022 at 12:01 AM Ryusuke Konishi > > > > > <konishi.ryusuke@xxxxxxxxx> wrote: > > > > > > > > > > > > Hi Pavel and Dongliang, > > > > > > > > > > > > On Sun, Mar 13, 2022 at 12:16 AM Pavel Skripkin <paskripkin@xxxxxxxxx> wrote: > > > > > > > > > > > > > > Hi Ryusuke, > > > > > > > > > > > > > > On 3/12/22 18:11, Ryusuke Konishi wrote: > > > > > > > >> In case of nilfs_attach_log_writer() error code jumps to > > > > > > > >> failed_checkpoint label and calls destroy_nilfs() which should call > > > > > > > >> nilfs_sysfs_delete_device_group(). > > > > > > > > > > > > > > > > nilfs_sysfs_delete_device_group() is called in destroy_nilfs() > > > > > > > > if nilfs->ns_flags has THE_NILFS_INIT flag -- nilfs_init() inline > > > > > > > > function tests this flag. > > > > > > > > > > > > > > > > The flag is set after init_nilfs() succeeded at the beginning of > > > > > > > > nilfs_fill_super() because the set_nilfs_init() inline in init_nilfs() sets it. > > > > > > > > > > > > > > > > So, nilfs_sysfs_delete_group() seems to be called in case of > > > > > > > > the above failure. Am I missing something? > > > > > > > > > > > > > > > > > > > > > > Yeah, that's what I mean :) I can't see how reported issue is possible > > > > > > > with current code. > > > > > > > > > > > > > > > > > > > > > Sorry for not being clear > > > > > > > > > > > > Understood, thanks for the reply. > > > > > > > > > > > > If so, the case where nilfs_sysfs_create_device_group() itself failed, > > > > > > is suspicious as mentioned in the previous mail. A possible scenario > > > > > > I guess is : > > > > > > > > > > > > - nilfs_sysfs_create_device_group() on the first mount try fails and leaks > > > > > > due to lack of kobject_del() in the error path. > > > > > > - Then, nilfs_sysfs_create_device_group() on the next mount try hits > > > > > > the leak detector at kobject_init_and_add(). > > > > > > > > > > > > So, if the leak bug is reproducible, I'd like to ask Dongliang to > > > > > > test the effect of the first patch. > > > > > > > > > > If my local syzkaller instance gets a reproducer, I will try to do this. > > > > > > > > > > > > > > > > > Regards, > > > > > > Ryusuke Konishi > > > > > > > > Hi Ryusuke, > > > > > > > > The crash still occurred in my newly set up syzkaller instance. It > > > > appears after two days' fuzzing. > > > > > > > > I remember you suggested me to add kobject_del just for testing, > > > > right? And let's see if this crash still occurs any more. > > > > > > You need a few days to reproduce it ? > > > If so, I think this confirmation method is uncertain. > > > In that case, I will try inserting an artificial error by changing > > > nilfs_sysfs_create_device_group() a bit to confirm if the same crash occurs. > > I tried to change the code of nilfs_sysfs_create_device_group() so that > an error occurs once every two times. > As a result, the leak bug was not reproduced. > > In addition, by kobject debug messages, I saw that the device name > ("loop2" in your case) was properly freed through kobject_put() even in > the erroneous case. > > So, my previous guess was wrong. > Looks like there is another cause for the leak of the device name. > It may not be a nilfs2 issue, I don't know. > > > I am reproducing another bug [1] recently. If you can spare some time > > figuring out the underlying issue, that's really great. Or we can wait > > some time for the bug to disclose more, after all, it is only a rare > > memory leak. > > > > [1] https://syzkaller.appspot.com/bug?extid=045796dbe294d53147e6 > > According to the log, it looks like "erofs_put_super() -> > erofs_unregister_sysfs()" hits: > > kobject: '(null)' (ffff88807b550190): is not initialized, yet > kobject_put() is being called. > > This warning is output in kobject_put() if kobj argument is not in > 'state_initialized': > > void kobject_put(struct kobject *kobj) > { > if (kobj) { > if (!kobj->state_initialized) > WARN(1, KERN_WARNING > "kobject: '%s' (%p): is not > initialized, yet kobject_put() is being called.\n", > kobject_name(kobj), kobj); > kref_put(&kobj->kref, kobject_release); > } > } > > How about chasing this abnormal condition ? > Anyway, please ask erofs maintainers and linux-erofs mailing list for this. Thanks for your information. I have got the reproducer and sent the patch to the kernel mailing list this afternoon. I will start reproducing this case and try to fix it if reproducible. Thanks very much. > > Regards, > Ryusuke Konishi