Re: [bug report] shmem: quota support

Hugh Dickins <hughd@xxxxxxxxxx> · Wed, 2 Aug 2023 17:00:49 -0700 (PDT)

On Wed, 2 Aug 2023, Carlos Maiolino wrote:
> On Wed, Aug 02, 2023 at 09:53:54AM +0300, Dan Carpenter wrote:
> > Hello Carlos Maiolino,
> > 
> > The patch 9a9f8f590f6d: "shmem: quota support" from Jul 25, 2023
> > (linux-next), leads to the following Smatch static checker warning:
> > 
> > 	fs/quota/dquot.c:1271 flush_warnings()
> > 	warn: sleeping in atomic context
> > 
> 
> Thanks for the report Dan!
> 
> > fs/quota/dquot.c
> >     1261 static void flush_warnings(struct dquot_warn *warn)
> >     1262 {
> >     1263         int i;
> >     1264
> >     1265         for (i = 0; i < MAXQUOTAS; i++) {
> >     1266                 if (warn[i].w_type == QUOTA_NL_NOWARN)
> >     1267                         continue;
> >     1268 #ifdef CONFIG_PRINT_QUOTA_WARNING
> >     1269                 print_warning(&warn[i]);
> >     1270 #endif
> > --> 1271                 quota_send_warning(warn[i].w_dq_id,
> >     1272                                    warn[i].w_sb->s_dev, warn[i].w_type);
> > 
> > The quota_send_warning() function does GFP_NOFS allocations, which don't
> > touch the fs but can still sleep.  GFP_ATOMIC or GFP_NOWAIT don't sleep.
> > 
> 
> Hmm, tbh I think the simplest way to fix it is indeed change GFP_NOFS to
> GFP_NOWAIT when calling genlmsg_new(), quota_send_warnings() already abstain to
> pass any error back to its caller, I don't think moving it from GFP_NOFS ->
> GFP_NOWAIT will have much impact here as the amount of memory required for it is
> not that big and wouldn't fail unless free memory is really short. I might be
> wrong though.
> 
> If not that, another option would be to swap tmpfs spinlocks for mutexes, but I
> would rather avoid that.
> 
> CC'ing other folks for more suggestions.

This is certainly a problem, for both dquot_alloc and dquot_free paths.
Thank you, Dan, for catching it.

GFP_NOWAIT is an invitation to flakiness: I don't think it's right to
regress existing quota users by changing GFP_NOFS to GFP_NOWAIT in all
cases there; but it does seem a sensible stopgap for the new experimental
user tmpfs.

I think the thing to do, for now, is to add a flag (DQUOT_SPACE_WARN_NOWAIT?)
which gets passed down to the __dquot_alloc and __dquot_free for tmpfs,
and those choose GFP_NOFS or GFP_NOWAIT accordingly, and pass that gfp_t
on down to flush_warnings() to quota_send_warning() to genlmsg_new() and
genlmsg_multicast().  Carlos, if you agree, please try that.

I have no experience with netlink whatsoever: I hope that will be enough
to stop it from blocking.

I did toy with the idea of passing back the dquot_warn, and letting the
caller do the flush_warnings() at a more suitable moment; and that might
work out, but I suspect that the rearrangement involved would be better
directed to just rearranging where mm/shmem.c makes it dquot_alloc and
dquot_free calls.

And that's something I shall probably want to do, but not rush into.
There's an existing problem, for which I do have the pre-quotas fix,
of concurrent faulters in a size-limited tmpfs getting failures when
they try to allocate the last block (worse when huge page).  Respecting
all the different layers of limiting is awkward, now quotas add another.

Ordinariily, with this blocking issue coming up now, I'd have asked to
back the tmpfs quotas series out of the next tree, and you rework where
the dquot_alloc and dquot_free are done, then bring the series back in
the next cycle.  But with it being managed from vfs.git for this cycle,
and strong preference to return to mm.git next cycle, let's try for a
workaround now, then maybe I can do the rearrangement in mm/shmem.c
next cycle - it is one of the things I was hoping to do then (but
"hope" is not much of a guarantee).

info->mutex instead of info->lock: no thanks.

Hugh

> 
> Carlos
> 
> >     1273         }
> >     1274 }
> > 
> > The call trees that Smatch is worried about are listed.  The "disables
> > preempt" functions take the spin_lock_irq(&info->lock) before calling
> > shmem_recalc_inode().
> > 
> > shmem_charge() <- disables preempt
> > shmem_uncharge() <- disables preempt
> > shmem_undo_range() <- disables preempt
> > shmem_getattr() <- disables preempt
> > shmem_writepage() <- disables preempt
> > shmem_set_folio_swapin_error() <- disables preempt
> > shmem_swapin_folio() <- disables preempt
> > shmem_get_folio_gfp() <- disables preempt
> > -> shmem_recalc_inode()
> >    -> shmem_inode_unacct_blocks()
> >       -> dquot_free_block_nodirty()
> >          -> dquot_free_space_nodirty()
> >             -> __dquot_free_space()
> >                -> flush_warnings()
> 
> Hm, I see, we add dquot_free_block_nodirty() call to shmem_inode_unacct_blocks()
> here, which leads to this possible sleep inside spin_lock_irq().
> > 
> > regards,
> > dan carpenter