Re: [PATCH v6 1/2] sb: add a new writeback list for sync

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jan 21, 2016 at 05:34:11PM +0100, Jan Kara wrote:
> On Thu 21-01-16 10:22:57, Brian Foster wrote:
> > On Thu, Jan 21, 2016 at 07:11:59AM +1100, Dave Chinner wrote:
> > > On Wed, Jan 20, 2016 at 02:26:26PM +0100, Jan Kara wrote:
> > > > On Tue 19-01-16 12:59:12, Brian Foster wrote:
> > > > > From: Dave Chinner <dchinner@xxxxxxxxxx>
> > > > > 
...
> > > > 
> > 
> > Hi Jan, Dave,
> > 
...
> > > > a) How much sync(2) speed has improved if there's not much to wait for.
> > > 
> > > Depends on the size of the inode cache when sync is run.  If it's
> > > empty it's not noticable. When you have tens of millions of cached,
> > > clean inodes the inode list traversal can takes tens of seconds.
> > > This is the sort of problem Josef reported that FB were having...
> > > 
> > 
> > FWIW, Ceph has indicated this is a pain point for them as well. The
> > results at [0] below show the difference in sync time with a largely
> > populated inode cache before and after this patch.
> > 
> > > > b) See whether parallel heavy stat(2) load which is rotating lots of inodes
> > > > in inode cache sees some improvement when it doesn't have to contend with
> > > > sync(2) on s_inode_list_lock. I believe Dave Chinner had some loads where
> > > > the contention on s_inode_list_lock due to sync and rotation of inodes was
> > > > pretty heavy.
> > > 
> > > Just my usual fsmark workloads - they have parallel find and
> > > parallel ls -lR traversals over the created fileset. Even just
> > > running sync during creation (because there are millions of cached
> > > inodes, and ~250,000 inodes being instiated and reclaimed every
> > > second) causes lock contention problems....
> > > 
> > 
> > I ran a similar parallel (16x) fs_mark workload using '-S 4,' which
> > incorporates a sync() per pass. Without this patch, this demonstrates a
> > slow degradation as the inode cache grows. Results at [1].
> 
> Thanks for the results. I think it would be good if you incorporated them
> in the changelog since other people will likely be asking similar
> questions when seeing the inode is growing. Other than that feel free to
> add:
> 
> Reviewed-by: Jan Kara <jack@xxxxxxx>
> 

No problem, thanks! Sure, I don't want to dump the raw stuff into the
commit log description to avoid making it too long, but I can reference
the core sync time impact. I've appended the following for now:

    "With this change, filesystem sync times are significantly reduced for
    fs' with largely populated inode caches and otherwise no other work to
    do. For example, on a 16xcpu 2GHz x86-64 server, 10TB XFS filesystem
    with a ~10m entry inode cache, sync times are reduced from ~7.3s to less
    than 0.1s when the filesystem is fully clean."

I'll repost in a day or so if I don't receive any other feedback.

Brian

> 								Honza
> > 16xcpu, 32GB RAM x86-64 server
> > Storage is LVM volumes on hw raid0.
> > 
> > [0] -- sync test w/ ~10m clean inode cache
> > - 10TB pre-populated XFS fs, cache populated via parallel find/stat
> >   workload
> > 
> > --- 4.4.0+
> > 
> > # cat /proc/slabinfo | grep xfs
> > xfs_dqtrx              0      0    528   62    8 : tunables    0    0    0 : slabdata      0      0      0
> > xfs_dquot              0      0    656   49    8 : tunables    0    0    0 : slabdata      0      0      0
> > xfs_buf           496293 496893    640   51    8 : tunables    0    0    0 : slabdata   9743   9743      0
> > xfs_icr                0      0    144   56    2 : tunables    0    0    0 : slabdata      0      0      0
> > xfs_inode         10528071 10529150   1728   18    8 : tunables    0    0    0 : slabdata 584999 584999      0
> > xfs_efd_item           0      0    400   40    4 : tunables    0    0    0 : slabdata      0      0      0
> > xfs_da_state         544    544    480   34    4 : tunables    0    0    0 : slabdata     16     16      0
> > xfs_btree_cur          0      0    208   39    2 : tunables    0    0    0 : slabdata      0      0      0
> > 
> > # time sync
> > 
> > real    0m7.322s
> > user    0m0.000s
> > sys     0m7.314s
> > # time sync
> > 
> > real    0m7.299s
> > user    0m0.000s
> > sys     0m7.296s
> > 
> > --- 4.4.0+ w/ sync patch
> > 
> > # cat /proc/slabinfo | grep xfs
> > xfs_dqtrx              0      0    528   62    8 : tunables    0    0    0 : slabdata      0      0      0
> > xfs_dquot              0      0    656   49    8 : tunables    0    0    0 : slabdata      0      0      0
> > xfs_buf           428214 428514    640   51    8 : tunables    0    0    0 : slabdata   8719   8719      0
> > xfs_icr                0      0    144   56    2 : tunables    0    0    0 : slabdata      0      0      0
> > xfs_inode         11054375 11054438   1728   18    8 : tunables    0    0    0 : slabdata 721323 721323      0
> > xfs_efd_item           0      0    400   40    4 : tunables    0    0    0 : slabdata      0      0      0
> > xfs_da_state         544    544    480   34    4 : tunables    0    0    0 : slabdata     16     16      0
> > xfs_btree_cur          0      0    208   39    2 : tunables    0    0    0 : slabdata      0      0      0
> > 
> > # time sync
> > 
> > real    0m0.040s
> > user    0m0.001s
> > sys     0m0.003s
> > # time sync
> > 
> > real    0m0.002s
> > user    0m0.001s
> > sys     0m0.002s
> > 
> > [1] -- fs_mark -D 1000 -S4 -n 1000 -d /mnt/0 ... -d /mnt/15 -L 32
> > - 1TB XFS fs
> > 
> > --- 4.4.0+
> > 
> > FSUse%        Count         Size    Files/sec     App Overhead
> >      2        16000        51200       3313.3           822514
> >      2        32000        51200       3353.6           310268
> >      2        48000        51200       3475.2           289941
> >      2        64000        51200       3104.6           289993
> >      2        80000        51200       2944.9           292124
> >      2        96000        51200       3010.4           288042
> >      3       112000        51200       2756.4           289761
> >      3       128000        51200       2753.2           288096
> >      3       144000        51200       2474.4           290797
> >      3       160000        51200       2657.9           290898
> >      3       176000        51200       2498.0           288247
> >      3       192000        51200       2415.5           287329
> >      3       208000        51200       2336.1           291113
> >      3       224000        51200       2352.9           290103
> >      3       240000        51200       2309.6           289580
> >      3       256000        51200       2344.3           289828
> >      3       272000        51200       2293.0           291282
> >      3       288000        51200       2295.5           286538
> >      4       304000        51200       2119.0           288906
> >      4       320000        51200       2059.6           293605
> >      4       336000        51200       2129.1           289825
> >      4       352000        51200       1929.8           288186
> >      4       368000        51200       1987.5           294596
> >      4       384000        51200       1929.1           293528
> >      4       400000        51200       1934.8           288138
> >      4       416000        51200       1823.6           292318
> >      4       432000        51200       1838.7           290890
> >      4       448000        51200       1797.5           288816
> >      4       464000        51200       1823.2           287190
> >      4       480000        51200       1738.7           295745
> >      4       496000        51200       1716.4           293821
> >      5       512000        51200       1726.7           290445
> > 
> > --- 4.4.0+ w/ sync patch
> > 
> > FSUse%        Count         Size    Files/sec     App Overhead
> >      2        16000        51200       3409.7           999579
> >      2        32000        51200       3481.3           286877
> >      2        48000        51200       3447.3           282743
> >      2        64000        51200       3522.3           283400
> >      2        80000        51200       3427.0           286360
> >      2        96000        51200       3360.2           307219
> >      3       112000        51200       3377.7           286625
> >      3       128000        51200       3363.7           285929
> >      3       144000        51200       3345.7           283138
> >      3       160000        51200       3384.9           291081
> >      3       176000        51200       3084.1           285265
> >      3       192000        51200       3388.4           291439
> >      3       208000        51200       3242.8           286332
> >      3       224000        51200       3337.9           285006
> >      3       240000        51200       3442.8           292109
> >      3       256000        51200       3230.3           283432
> >      3       272000        51200       3358.3           286996
> >      3       288000        51200       3309.0           288058
> >      4       304000        51200       3293.4           284309
> >      4       320000        51200       3221.4           284476
> >      4       336000        51200       3241.5           283968
> >      4       352000        51200       3228.3           284354
> >      4       368000        51200       3255.7           286072
> >      4       384000        51200       3094.6           290240
> >      4       400000        51200       3385.6           288158
> >      4       416000        51200       3265.2           284387
> >      4       432000        51200       3315.2           289656
> >      4       448000        51200       3275.1           284562
> >      4       464000        51200       3238.4           294976
> >      4       480000        51200       3060.0           290088
> >      4       496000        51200       3359.5           286949
> >      5       512000        51200       3156.2           288126
> > 
> -- 
> Jan Kara <jack@xxxxxxxx>
> SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux