Re: [PATCH 1/1] - Fix reiserfs WARNING in dquot_writeback_dquots

Jeff Mahoney <jeffm@xxxxxxxx> · Thu, 22 Jun 2017 14:54:29 -0400

On 6/14/17 11:27 PM, Tim Savannah wrote:
> Any comments? Can we get this merged, or some variation? It affects a
> lot more than just all my machines. Google shows this traceback is
> occurring for others as well.

Hi Tim -

This patch was merged for 4.12:

commit 1e0e653f1136a413a9969e5d0d548ee6499b9763
Author: Jan Kara <jack@xxxxxxx>
Date:   Wed Apr 5 14:17:30 2017 +0200

    reiserfs: Protect dquot_writeback_dquots() by s_umount semaphore

    dquot_writeback_dquots() expects s_umount semaphore to be held to
    protect it from other concurrent quota operations. reiserfs_sync_fs()
    can call dquot_writeback_dquots() without holding s_umount semaphore
    when called from flush_old_commits().

    Fix the problem by grabbing s_umount in flush_old_commits(). However we
    have to be careful and use only trylock since reiserfs_cancel_old_sync()
    can be waiting for flush_old_commits() to complete while holding
    s_umount semaphore. Possible postponing of sync work is not a big deal
    though as that is only an opportunistic flush.

    Fixes: 9d1ccbe70e0b14545caad12dc73adb3605447df0
    Reported-by: Jan Beulich <jbeulich@xxxxxxxx>
    Signed-off-by: Jan Kara <jack@xxxxxxx>

Your patch was not correct.  I'll provide review below if you're interested in the details.

> On Mon, May 29, 2017 at 12:57 AM, Tim Savannah <kata198@xxxxxxxxx> wrote:
>> Oops, sent last one without patch on accident. Attached this time.
>>
>>
>> This has been happening for me since 4.10
>>
>> dquot_writeback_dquots expects a lock to be held on super_block->s_umount ,
>>
>> and reiserfs_sync_fs, which calls dquot_writeback_dquots, does not
>> obtain such a lock.
>>
>> Thus, the following warning is generated:
>>
>> [Sun May 28 04:58:06 2017] ------------[ cut here ]------------
>> [Sun May 28 04:58:06 2017] WARNING: CPU: 0 PID: 31 at
>> fs/quota/dquot.c:620 dquot_writeback_dquots+0x248/0x250
>> [Sun May 28 04:58:06 2017] Modules linked in: bbswitch(O)
>> nls_iso8859_1 nls_cp437 iTCO_wdt iTCO_vendor_support acer_wmi
>> sparse_keymap coretemp efi_pstore hwmon intel_rapl
>> x86_pkg_temp_thermal intel_powerclamp pcspkr ath9k ath9k_common
>> ath9k_hw ath efivars mac80211 joydev psmouse i2c_i801 cfg80211
>> input_leds led_class nvidiafb vgastate fb_ddc atl1c i915
>> drm_kms_helper drm intel_gtt syscopyarea sysfillrect sysimgblt mei_me
>> fb_sys_fops i2c_algo_bit mei lpc_ich shpchp acpi_cpufreq thermal wmi
>> video tpm_tis tpm_tis_core button tpm sch_fq_codel evdev mac_hid
>> uvcvideo vboxnetflt(O) videobuf2_vmalloc videobuf2_memops
>> vboxnetadp(O) videobuf2_v4l2 videobuf2_core pci_stub videodev
>> vboxpci(O) media ath3k btusb btrtl btbcm btintel vboxdrv(O) bluetooth
>> rfkill loop usbip_host usbip_core sg ip_tables x_tables hid_generic
>> usbhid
>> [Sun May 28 04:58:06 2017]  hid sr_mod cdrom sd_mod serio_raw atkbd
>> libps2 ehci_pci xhci_pci ahci xhci_hcd ehci_hcd libahci libata
>> scsi_mod usbcore usb_common i8042 serio raid1 raid0 dm_mod md_mod
>> [Sun May 28 04:58:06 2017] CPU: 0 PID: 31 Comm: kworker/0:1 Tainted: G
>>           O    4.11.3-1-ck2-ck #1
>> [Sun May 28 04:58:06 2017] Hardware name: Acer Aspire V3-771/VA70_HC,
>> BIOS V2.16 01/14/2013
>> [Sun May 28 04:58:06 2017] Workqueue: events_long flush_old_commits
>> [Sun May 28 04:58:06 2017] Call Trace:
>> [Sun May 28 04:58:06 2017]  ? dump_stack+0x5c/0x7a
>> [Sun May 28 04:58:06 2017]  ? __warn+0xb4/0xd0
>> [Sun May 27 04:58:06 2017]  ? dquot_writeback_dquots+0x248/0x250
>> [Sun May 27 04:58:06 2017]  ? reiserfs_sync_fs+0x12/0x70
>> [Sun May 27 04:58:06 2017]  ? dbs_work_handler+0x3d/0x50
>> [Sun May 27 04:58:06 2017]  ? flush_old_commits+0x30/0x50
>> [Sun May 27 04:58:06 2017]  ? process_one_work+0x1b1/0x3a0
>> [Sun May 27 04:58:06 2017]  ? worker_thread+0x42/0x4c0
>> [Sun May 27 04:58:06 2017]  ? kthread+0xf2/0x130
>> [Sun May 27 04:58:06 2017]  ? process_one_work+0x3a0/0x3a0
>> [Sun May 27 04:58:06 2017]  ? kthread_create_on_node+0x40/0x40
>> [Sun May 27 04:58:06 2017]  ? ret_from_fork+0x26/0x40
>> [Sun May 27 04:58:06 2017] ---[ end trace 7e040d020ba99607 ]---
>>
>>
>> This occurs during system boot on a fully-updated Archlinux system,
>> and has so since 4.10 100% of the time. It may occur after as well,
>> but it's a WARN_ONCE.
>>
>> The attached patch corrects this issue by first trying to obtain a
>> readlock on said structure member, and if it got it, releases it
>> before returning.

In the future, please include your patch inline as part of the message.

>> After applying the patch, my system is completely stable and the
>> warning no longer occurs. Mounting and unmounting works as expected
>> without issue.

I suspect this is because you aren't doing any of the things that would
conflict here.  Remounting, freeze/thaw, or really anything that takes
->s_umount as a writer running in a different thread would cause problems.

> --- a/fs/reiserfs/super.c	2017-05-29 00:14:45.000000000 -0400
> +++ b/fs/reiserfs/super.c	2017-05-29 00:51:44.000000000 -0400
> @@ -67,17 +67,28 @@
>  static int reiserfs_sync_fs(struct super_block *s, int wait)
>  {
>  	struct reiserfs_transaction_handle th;
> +	int got_lock;
>  
>  	/*
>  	 * Writeback quota in non-journalled quota case - journalled quota has
>  	 * no dirty dquots
>  	 */
> +
> +	if ( down_read_trylock(&s->s_umount) )
> +		got_lock = 1;
> +	else
> +		got_lock = 0;
> +
> 

This is pretty much the same as not using the lock.  The lock is a requirement to
continue and assuming that if we can't get the lock that we must already be holding
it is incorrect.  There may be other threads executing with s->s_umount held as writers
and this patch means that this would execute concurrently, which is incorrect.

>  	dquot_writeback_dquots(s, -1);
>  	reiserfs_write_lock(s);
>  	if (!journal_begin(&th, s, 1))
>  		if (!journal_end_sync(&th))
>  			reiserfs_flush_old_commits(s);
>  	reiserfs_write_unlock(s);
> +
> +	if ( got_lock )
> +		up_read(&s->s_umount);
> +
>  	return 0;
>  }

Please follow the surrounding style.  Spaces within the conditional are
not part of the accepted style[1].

-Jeff

[1] https://github.com/torvalds/linux/blob/master/Documentation/process/coding-style.rst

-- 
Jeff Mahoney
SUSE Labs

Attachment:
signature.asc

Description: OpenPGP digital signature