Re: [PATCH, RFC 0/3] Introduce new O_HOT and O_COLD flags

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 20 Apr 2012, James Bottomley wrote:

> On Fri, 2012-04-20 at 11:45 +0200, Lukas Czerner wrote:
> > On Fri, 20 Apr 2012, Boaz Harrosh wrote:
> > 
> > > On 04/19/2012 10:20 PM, Theodore Ts'o wrote:
> > > 
> > > > As I had brought up during one of the lightning talks at the Linux
> > > > Storage and Filesystem workshop, I am interested in introducing two new
> > > > open flags, O_HOT and O_COLD.  These flags are passed down to the
> > > > individual file system's inode operations' create function, and the file
> > > > system can use these flags as a hint regarding whether the file is
> > > > likely to be accessed frequently or not.
> > > > 
> > > > In the future I plan to do further work on how ext4 would use these
> > > > flags, but I want to first get the ability to pass these flags plumbed
> > > > into the VFS layer and the code points for O_HOT and O_COLD reserved.
> > > > 
> > > > 
> > > > Theodore Ts'o (3):
> > > >   fs: add new open flags O_HOT and O_COLD
> > > >   fs: propagate the open_flags structure down to the low-level fs's
> > > >     create()
> > > >   ext4: use the O_HOT and O_COLD open flags to influence inode
> > > >     allocation
> > > > 
> > > 
> > > 
> > > I would expect that the first, and most important patch to this
> > > set would be the man page which would define the new API. 
> > > What do you mean by cold/normal/hot? what is expected if supported?
> > > how can we know if supported? ....
> > 
> > Well, this is exactly my concern as well. There is no way anyone would
> > know what it actually means a what users can expect form using it. The
> > result of this is very simple, everyone will just use O_HOT for
> > everything (if they will use it at all).
> > 
> > Ted, as I've mentioned on LSF I think that the HOT/COLD name is really
> > bad choice for exactly this reason. It means nothing. If you want to use
> > this flag to place the inode on the faster part of the disk, then just
> > say so and name the flag accordingly, this way everyone can use it.
> > However for this to actually work we need some fs<->storage interface to
> > query storage layout, which actually should not be that hard to do. I am
> > afraid that in current form it will suit only Google and Taobao. I would
> > really like to have interface to pass tags between user->fs and
> > fs<->storage, but this one does not seem like a good start.
> 
> I think this is a little unfair.  We already have the notion of hot and
> cold pages within the page cache.  The definitions for storage is
> similar: a hot block is one which will likely be read again shortly and
> a cold block is one that likely won't (ignoring the 30 odd gradations of
> in-between that the draft standard currently mandates)

You're right, but there is a crucial difference, you can not compare
a page with a file. Page will be read or .. well not read so often, but
that's just one dimension. Files has a lot more dimensions, will it be
rewritten often ? will it be read often, appended often, do we need
really fast first access ? do we need fast metadata operation ? Will
this file be there forever, or is it just temporary ? Do we need fast
read/write ? and many more...

> 
> The concern I have is that the notion of hot and cold files *isn't*
> propagated to the page cache, it's just shared between the fs and the
> disk.  It looks like we could tie the notion of file opened with O_HOT
> or O_COLD into the page reclaimers and actually call
> free_hot_cold_page() with the correct flag, meaning we might get an
> immediate benefit even in the absence of hint supporting disks.

And this is actually very good idea, but the file flag should not be
O_HOT/O_COLD (and in this case being it open flag is really disputable
as well), but rather hold-this-file-in-memory-longer-than-others, or
will-read-this-file-quite-often. Moreover since with Ted's patches O_HOT
means put the file on faster part of the disk (or rather whatever fs
thinks is fast part of the disk, since the interface to get such
information is missing) we already have one "meaning" and with this
we'll add yet another, completely different meaning to the single
flag. That seems messy.

Thanks!
-Lukas

> 
> I cc'd linux-mm to see if there might be an interest in this ... or even
> if it's worth it: I can also see we don't necessarily want userspace to
> be able to tamper with our idea of what's hot and cold in the page
> cache, since we get it primarily from the lru lists.
> 
> James
> 
> 
> 

-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux