On Wed, Nov 07, 2012 at 02:36:42PM +0800, Zheng Liu wrote: > On Tue, Nov 06, 2012 at 03:10:11PM -0800, Darrick J. Wong wrote: > > On Tue, Nov 06, 2012 at 05:36:38PM +0800, Ram Pai wrote: > > > On Fri, Nov 02, 2012 at 04:41:09PM +0800, Zheng Liu wrote: > > > > On Fri, Nov 02, 2012 at 02:38:29PM +0800, Zhi Yong Wu wrote: > > > > > Here also has another question. > > > > > > > > > > How to save the file temperature among the umount to be able to > > > > > preserve the file tempreture after reboot? > > > > > > > > > > This above is the requirement from DB product. > > > > > I thought that we can save file temperature in its inode struct, that > > > > > is, add one new field in struct inode, then this info will be written > > > > > to disk with inode. > > > > > > > > > > Any comments or ideas are appreciated, thanks. > > > > > > > > Hi Zhiyong, > > > > > > > > I think that we might define a callback function. If a filesystem wants > > > > to save these data, it can implement a function to save them. The > > > > filesystem can decide whether adding it or not by themselves. > > > > > > > > BTW, actually I don't really care about how to save these data because I > > > > only want to observe which file is accessed in real time, which is very > > > > useful for me to track a problem in our product system. > > > > > > To me, umounting a filesystem is a way of explicitly telling the VFS that the > > > filesystem's data is not hot anymore. So probably, it really does not make > > > sense to store temperatures across mount boundaries. > > > > I'd prefer that file heat data to be retained across mounts -- we shouldn't > > throw away all of our observations just because of a system crash / power > > outage / scheduled reboot. > > > > Or, imagine if you're a defragging tool. If you're clever enough to try > > consolidating all the hot blocks in one place on disk so that you could > > aggressively read them all in at once (e.g. ureadahead), I think you'd want to > > be able to access as big of an observation pool as possible. > > > > This just occurred to me -- are you saving all of the file's heat data, like > > the per-range read/write counters, and the averages? Or just a single compiled > > heat rating for the whole file? I suggested a big hidden file a few days ago > > because I'd thought you were trying to save all the range/heat data, which > > would probably be painful to shoehorn into an xattr. If you're only storing a > > single number, then the xattr way is probably ok. > > Hi Darrick, > > Maybe the best way is that a new mount option or a switch in sysfs is > provided to turn on/off it. The user can decide whether it is enabled > or not. After all it will bring some extra overhead. At least turning > it on in our product system is unacceptable for me if there is no any > problem that I need to track. Hmm... who are the intended in-kernel users of the hot tracking feature? I'm starting to wonder if it's possible (or desirable) to implement some of this in userspace and have the kernel ask for the hot data as needed, or simply write a driver program that handles the strategy and only needs the kernel interface that moves extents around. I feel like we could just write a regular program that uses ftrace to record io activity and manage all the observations that we pick up, and then the db, defrag, dedupe, etc. programs can just call into that? On the other hand, writing some daemon program has its own problems with distribution, starting it up, and killing it off at shutdown. But it would make Zheng's (non)use case easier -- if you don't want it, don't run it. Perhaps this approach has already been discussed and thrown out? In which case I'll shut up. :) --D > > Regards, > Zheng -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html