Re: quota: dqio_mutex design

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu 03-08-17 16:55:40, Andrew Perepechko wrote:
> Let me put it this way:
> 
> Under file creation from different threads, ext4 will generate a series of
> dquot updates (incore and then ondisk, through journal):
> 
> dquot update1
> dquot update2
> dquot update3
> ...
> dquot updateN
> 
> Either with my patch or without it, ondisk dquot update through journal
> may miss dquot update1, dquot update2, ... dquot update{N-1}.
> 
> You can easily see that from the code of dquot_commit():
> 
> int dquot_commit(struct dquot *dquot)
> {
>         int ret = 0;
>         struct quota_info *dqopt = sb_dqopt(dquot->dq_sb);
> 
>         mutex_lock(&dqopt->dqio_mutex);
>         spin_lock(&dq_list_lock);
>         if (!clear_dquot_dirty(dquot)) {
>                 spin_unlock(&dq_list_lock);
>                 goto out_sem;
>         }
> ...
> }
> 
> 
> If actual dquot_commit() wrote dquot update N, the threads commiting
> updates 1 through N-1 will exit immediately once they get dqio_mutex
> since the dquot will NOT be dirty.
> 
> My patch only avoids blocking on dqio_mutex when we know for sure
> that another will NECESSARILY write the needed or a FRESHER dquot ondisk.

Yeah, I agree with Andrew. What they did is *almost* safe for ext4. The
only moment when it is not safe is when someone calls mark_dquot_dirty()
outside of a scope of a transaction which happens when doing Q_SETQUOTA
quotactl.

Another things which is subtle with Andrew's approach is that process
modifying quota information can return and stop its handle before quota
data gets copied to transaction buffer. This does not currently create any
real problem since nobody is relying on that however it relies on intimate
details of JBD2 transaction machinery and that could bite us in the future.

								Honza

> > > This change mean if this dquot is dirty we skip, this
> > > won't work because in this way, quota update is only kept in vfs dquota
> > > memory and newer update is not wrote to journal file and not wrapped into
> > > transaction too.
> > 
> > That's not true.
> > 
> > As I explained earlier, having DQ_MOD_B set at this point means another
> > thread is going to write dquot but hasn't yet started doing so. This thread
> > does not care whether it updates the ondisk dquot with its own data or with
> > fresher data which came from another thread. In-core dquot has no indication
> > of whose data in contains.
> > 
> > As I also explained earlier, the update cannot happen in the context of
> > another transaction because thread A which sees DQ_MOD_B set and thread
> > B which is running dquot_commit() both have journal handles to the same
> > transaction. There's only one running transaction at a time and thread B
> > does not switch to another transaction.
> > 
> > Please read the code carefully.
> > 
> > > This is not what journal quota means to do.
> > > 
> > > 
> > > Thanks,
> > > Shilong
> > > 
> > > > Thank you,
> > > > Andrew
> 
> 
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux