On Fri, Jul 14, 2017 at 03:07:41PM -0400, Brian Foster wrote: > On Fri, Jul 14, 2017 at 01:46:27PM -0500, Eric Sandeen wrote: > > On 07/14/2017 12:44 PM, Allison Henderson wrote: > > > On 7/14/2017 7:04 AM, Eric Sandeen wrote: > > >> > > >> > > >> On 07/14/2017 03:50 AM, Carlos Maiolino wrote: > > >>> Hi, > > >>> > > >>> On Thu, Jul 13, 2017 at 04:25:25PM -0700, Allison Henderson wrote: > > >>>> Hi all, > > >>>> > > >>>> I've been doing some digging on adding parent pointers to xfs and wanted to > > >>>> send a note to folks here to get peoples opinions on it. > > >>> > > >>> Are you talking about parent pointers in the BTrees? > > >>> > > >> > > >> No, see [RFC 00/17] RFC parent inode pointers. for example, from long > > >> ago. > > >> > > >> "Parent inode support allow XFS to quickly derive a file name and > > >> path from the mount point. This can aid in directory/path policies > > >> and can help relocate items during filesystem shrink." > > >> > > >> It has a long and ... difficult history. > > >> > > >> -Eric Woot, time to dive in. :) > > > Right, so to expand on Eric's answer, it looks like Dave and Brian > > > had been working on some improvements based on that set, but it's > > > not quite finished yet. The idea is that we add an extended > > > attribute to keep track of the parents inode and generation, and Well, a lot of attributes, in the case of multiply hardlinked files. One concern I have is that since a file can be hardlinked 2^32 times and path components have a maximum length of 255 bytes, we'll have to be careful about not overflowing di_anextents as a part of making a potentially huge attribute tree. > > > also the child entries offset, and filename. So in this solution > > > the EA is name={parent inode #, parent inode generation, dirent > > > offset}, value={dirent filename}. This (the disk format) is probably the most important part to get settled. FWIW, I've recaptured three of the previous discussions of parent pointers -- Lachlan McIlroy's code dump in 2009[1], Mark Tinguely's 2013 attempt[2] and 2014 attempt[3] at discussing patches. For those of you following along at home, the gmane link mentioned in [3] is the link at [1]. [1] http://oss.sgi.com/archives/xfs/2009-01/msg01068.html [2] http://oss.sgi.com/archives/xfs/2013-04/msg00214.html [3] https://www.spinics.net/lists/xfs/msg25175.html The latest round of parent pointer patches is (I think) in the form of an off-list patchset that Brian (and now Allison) are cleaning up for reposting, which hopefully happens soon because I feel rather ill at ease discussing off-list code that lacks S-o-b's. Anyway, the current proposal is to create a new xattr namespace (ATTR_PARENT) and to fill it with (parent_ino, parent_gen, dirent_offset) => (dirent name) attributes: struct xfs_parent_name_rec { __be64 p_ino; __be32 p_gen; __be32 p_diroffset; }; p_diroffset is really a directory datapointer, not a raw byte offset. We also have an incore data structure of: struct xfs_parent_name_irec { __uint64_t p_ino; __uint32_t p_gen; __uint32_t p_diroffset; const char *p_name; __uint32_t p_namelen; }; I think p_ino ought to be xfs_ino_t and not uint64_t, and p_diroffset ought to be xfs_dir2_dataptr_t instead of uint32_t. > > > My goal at the moment is just to get it compiling again and finish > > > out some of the sub routines that maintain it. It looks like it > > > hasn't had much attention in a while, so I wanted to let people > > > know the direction I'm planning to move in before I get too far > > > in. <nod> > > If you're forward-porting that 17-patch set from Mark, I'd suggest > > first reading Dave's response to it - IIRC it amounted to a firm > > NAK. it also highlights the complexity of this undertaking, and may > > explain why nobody has gotten it done (yet) :) > > > > The patches that came from me (to Allison) were last sent to me directly > from Dave. I know he had some objections to the original design, but my > understanding was that he had incorporated fixes for those issues in his > modifications to Mark's original series (such as the xattr format and >From what I can tell, he seems to have taken Mark's patches to pass dir offsets back from the dirent manipulation functions and then started writing a simple implementation of stuffing the attrs into the attr tree. > whatnot), but the series wasn't completely done yet in terms of > supporting all possible directory operations. (It's probably a good idea > to review that old thread anyways just to confirm, unless Dave catches > this and is able to chime in one way or the other.) (Or jump in the fray out once he's back from sabbatical.) > I basically just forwarded ported Dave's patches to more recent kernels > and added a couple minor fixes as I had combed through the existing > code. I never really got to adding anything of substance before Allison > volunteered to take over the series. > > FWIW, I think there was some mention of porting some of the operations > over to the deferred ops infrastructure, but it's not clear to me off > the top of my head how important (or appropriate) that is. Most of the users of xfs_trans_roll seem to be buried in the xattr code. For example, _attr_set can end up rolling a transaction after converting an attr fork from short format to leaf format before retrying the attr add operation, but we don't use log redo items (er, defer_ops items) to make sure log recovery will restart the attr add operation if we crash before the final _trans_commit after calling _attr_{leaf,node}_addname. In the case of a regular xattr set operation this wouldn't have been a big deal because the fsetxattr call wouldn't have returned, so all the user could possibly see is an inode with a perhaps unnecessarily large xattr fork. Now that we need to set xattrs in the same transaction chain as a directory operation, it becomes very important that log recovery gain the ability to resume an xattr add operation no matter where the log stopped. Creating a directory can become this longwinded pile of updates: - Allocate directory block, map into data fork, chain rmap update, add directory entry, chain pptr update; - Add rmap entry for new directory block (more chaining to fix freelist), finish first rmap update; - Allocate xattr block for short->leaf conversion, map into attr fork, chain rmap update; chain xattr add operation; - Add rmap entry for new xattr block (more chaining to fix freelist), finish second rmap update; - Add xattr entry to file, finish xattr add operation; - Finish pptr update. Basically, I think someone's going to have to go audit all the uses of xfs_trans_roll in the attr code to figure out which operations need redo items, and how to cram everything toegether into a single xfs_defer_ops, rather than sprinkling them around the attr code like we do now, because redo items cannot be deferred from one defer_ops to another. > It's something to keep in mind, in any event. IIRC there were also > missing Signed-off-by's required for some of Mark's original patches. That's also a problem; unless someone can get Mark to supply them, we probably have to get someone to rewrite them. At a bare minimum I think we explicitly have to pass back a xfs_dir2_dataptr_t, not a bare uint32_t. > IMO, the best next step might be to just finish off the implementation > as-is such that we could have a fairly functional RFC to put on the > list and hash out whether there are in fact any remaining design > hurdles, but others might have a different opinion on that. :) Agreed. > Brian > > > -Eric > > > > > Allison > > >> > > >>>> I got in touch > > >>>> with Brian Foster not too long ago and he had some code partially done from > > >>>> about a year or so ago (looks like it has patches from Dave Chinner and Mark > > >>>> Tinguely as well). So I am hoping to be able to use what we have so far to > > >>>> create something updated and finished out. I am still pretty new to the xfs > > >>>> code, so at the moment I am still just going through old discussion threads, > > >>>> and reviewing the patches. But for the most part I just wanted to see what > > >>>> people thought and get everyone on board with the idea. Suggestions and > > >>>> feedback are much appreciated. Thank you!! > > >>>> > > >>> > > >>> It will be better if you can describe in more details if you have any specific > > >>> goal with this, and/or what kind of improvement you expect to have with it, > > >>> adding something new without a reason, is usually not well received. > > >>> > > >>> Cheers > > >>> > > >> -- > > >> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > > >> the body of a message to majordomo@xxxxxxxxxxxxxxx > > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > >> > > > -- > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html