Thanks for the feedback Darrick, comments in-line. > On Aug 31, 2017, at 9:26 PM, Darrick J. Wong <darrick.wong@xxxxxxxxxx> wrote: > > On Thu, Aug 31, 2017 at 06:00:21PM -0700, Richard Wareing wrote: >> Hello all, >> >> It turns out, XFS real-time volumes are actually a very useful/cool >> feature, I am wondering if there is support in the community to make >> this feature a bit more user friendly, easier to operate and interact >> with. To kick things off I bring patches table :). >> >> For those who aren't familiar with real-time XFS volumes, they are >> basically a method of storing the data blocks of some files on a >> separate device. In our specific application, are using real-time >> devices to store large files (>256KB) on HDDS, while all metadata & >> journal updates goto an SSD of suitable endurance & capacity. We also >> see use-cases for this for distributed storage systems such as >> GlusterFS which are heavy in metadata operations (80%+ of IOPs). By >> using real-time devices to tier your XFS filesystem storage, you can >> dramatically reduce HDD IOPs (50% in our case) and dramatically >> improve metadata and small file latency (HDD->SSD like reductions). >> >> Here are the features in the proposed patch set: >> >> 1. rtdefault - Defaulting block allocations to the real-time device >> via a mount flag rtdefault, vs using an inheritance flag or ioctl's. >> This options gives users tier'ing of their metadata out of the box >> with ease, and in a manner more users are familiar with (mount flags), >> vs having to set inheritance bits or use ioctls (many distributed >> storage developers are resistant to including FS specific code into >> their stacks). > > The ioctl to set RTINHERIT/REALTIME is a VFS level ioctl now. I can > think of a couple problems with the mount option -- first, more mount > options to test (or not ;)); what happens if you actually want your file > to end up on the data device; and won't this surprise all the existing > programs that are accustomed to the traditional way of handling rt > devices? > > I mean, you /could/ just mkfs.xfs -d rtinherit=1 and that would get you > mostly the same results, right? > > (Yeah, I know, undocumented mkfs option... <grumble>) Checkout my reply to Dave on this :). > >> 2. rtstatfs - Returning real-time block device free space instead of >> the non-realtime device via the "rtstatfs" flag. This creates an >> experience/semantics which is a bit more familiar to users if they use >> real-time in a tiering configuration. "df" reports the space on your >> HDDs, and the metadata space can be returned by a tool like xfs_info >> (I have patches for this too if there is interest) or xfs_io. I think >> this might be a bit more intuitive for the masses than the reverse >> (having to goto xfs_io for the HDD space, and df for the SSD >> metadata). > > I was a little surprised we don't just add up the data+rt space counters > for statfs; how /does/ one figure out how much space is free on the rt > device? > > (Will research this tomorrow if nobody pipes up in the mean time.) > I was as well! >> 3. rtfallocmin - This option can be combined with either rtdefault or >> standalone. When combined with rtdefault, it uses fallocate as >> "signal" to *exempt* storage on the real-time device, automatically >> promoting small fallocations to the SSD, while directing larger ones >> (or fallocation-less creations) to the HDD. This option also works >> really well with tools like "rsync" which support fallocate >> (--preallocate flag) so users can easily promote/demote files to/from >> the SSD. > > I see where you're coming from, but I don't think it's a good idea to > overload the existing fallocate interface to have it decide device > placement too. The side effects of the existing mode flags are well > known and it's hard to get everyone on board with a semantic change to > an existing mode. I'm not proposing we remove the flags, just add this option to allow admins a slightly easier way to interact and leverage existing utilities. > >> Ideally, I'd like to help build-out more tiering features into XFS if >> there is interest in the community, but figured I'd start with these >> patches first. Other ideas/improvements: automatic eviction from SSD >> once file grows beyond rtfallocmin, automatic fall-back to real-time >> device if non-RT device (SSD) is out of blocks, add support for the >> more sophisticated AG based block allocator to RT (bitmapped version >> works well for us, but multi-threaded use-cases might not do as well). >> >> Looking forward to getting feedback! >> >> Richard Wareing >> >> Note: The patches should patch clean against the XFS Kernel master >> branch @ https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git (SHA: >> 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c). > > Needs a Signed-off-by... > > --D > >> >> ======= >> >> - Adds rtdefault mount option to default writes to real-time device. >> This removes the need for ioctl calls or inheritance bits to get files >> to flow to real-time device. >> - Enables XFS to store FS metadata on non-RT device (e.g. SSD) while >> storing data blocks on real-time device. Negates any code changes by >> application, install kernel, format, mount and profit. >> --- >> fs/xfs/xfs_inode.c | 8 ++++++++ >> fs/xfs/xfs_mount.h | 5 +++++ >> fs/xfs/xfs_super.c | 13 ++++++++++++- >> 3 files changed, 25 insertions(+), 1 deletion(-) >> >> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c >> index ec9826c..1611195 100644 >> --- a/fs/xfs/xfs_inode.c >> +++ b/fs/xfs/xfs_inode.c >> @@ -873,6 +873,14 @@ xfs_ialloc( >> break; >> case S_IFREG: >> case S_IFDIR: >> + /* Set flags if we are defaulting to real-time device */ >> + if (mp->m_rtdev_targp != NULL && >> + mp->m_flags & XFS_MOUNT_RTDEFAULT) { >> + if (S_ISDIR(mode)) >> + ip->i_d.di_flags |= XFS_DIFLAG_RTINHERIT; >> + else if (S_ISREG(mode)) >> + ip->i_d.di_flags |= XFS_DIFLAG_REALTIME; >> + } >> if (pip && (pip->i_d.di_flags & XFS_DIFLAG_ANY)) { >> uint64_t di_flags2 = 0; >> uint di_flags = 0; >> diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h >> index 9fa312a..da25398 100644 >> --- a/fs/xfs/xfs_mount.h >> +++ b/fs/xfs/xfs_mount.h >> @@ -243,6 +243,11 @@ typedef struct xfs_mount { >> allocator */ >> #define XFS_MOUNT_NOATTR2 (1ULL << 25) /* disable use of attr2 format */ >> >> +/* FB Real-time device options */ >> +#define XFS_MOUNT_RTDEFAULT (1ULL << 61) /* Always allocate blocks from >> + * RT device >> + */ >> + >> #define XFS_MOUNT_DAX (1ULL << 62) /* TEST ONLY! */ >> >> >> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c >> index 455a575..e4f85a9 100644 >> --- a/fs/xfs/xfs_super.c >> +++ b/fs/xfs/xfs_super.c >> @@ -83,7 +83,7 @@ enum { >> Opt_quota, Opt_noquota, Opt_usrquota, Opt_grpquota, Opt_prjquota, >> Opt_uquota, Opt_gquota, Opt_pquota, >> Opt_uqnoenforce, Opt_gqnoenforce, Opt_pqnoenforce, Opt_qnoenforce, >> - Opt_discard, Opt_nodiscard, Opt_dax, Opt_err, >> + Opt_discard, Opt_nodiscard, Opt_dax, Opt_rtdefault, Opt_err, >> }; >> >> static const match_table_t tokens = { >> @@ -133,6 +133,9 @@ static const match_table_t tokens = { >> >> {Opt_dax, "dax"}, /* Enable direct access to bdev pages */ >> >> +#ifdef CONFIG_XFS_RT >> + {Opt_rtdefault, "rtdefault"}, /* Default to real-time device */ >> +#endif >> /* Deprecated mount options scheduled for removal */ >> {Opt_barrier, "barrier"}, /* use writer barriers for log write and >> * unwritten extent conversion */ >> @@ -367,6 +370,11 @@ xfs_parseargs( >> case Opt_nodiscard: >> mp->m_flags &= ~XFS_MOUNT_DISCARD; >> break; >> +#ifdef CONFIG_XFS_RT >> + case Opt_rtdefault: >> + mp->m_flags |= XFS_MOUNT_RTDEFAULT; >> + break; >> +#endif >> #ifdef CONFIG_FS_DAX >> case Opt_dax: >> mp->m_flags |= XFS_MOUNT_DAX; >> @@ -492,6 +500,9 @@ xfs_showargs( >> { XFS_MOUNT_DISCARD, ",discard" }, >> { XFS_MOUNT_SMALL_INUMS, ",inode32" }, >> { XFS_MOUNT_DAX, ",dax" }, >> +#ifdef CONFIG_XFS_RT >> + { XFS_MOUNT_RTDEFAULT, ",rtdefault" }, >> +#endif >> { 0, NULL } >> }; >> static struct proc_xfs_info xfs_info_unset[] = { >> -- >> 2.9.3-- >> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html