Re: Parent pointers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jul 14, 2017 at 03:07:41PM -0400, Brian Foster wrote:
> On Fri, Jul 14, 2017 at 01:46:27PM -0500, Eric Sandeen wrote:
> > On 07/14/2017 12:44 PM, Allison Henderson wrote:
> > > On 7/14/2017 7:04 AM, Eric Sandeen wrote:
> > >>
> > >>
> > >> On 07/14/2017 03:50 AM, Carlos Maiolino wrote:
> > >>> Hi,
> > >>>
> > >>> On Thu, Jul 13, 2017 at 04:25:25PM -0700, Allison Henderson wrote:
> > >>>> Hi all,
> > >>>>
> > >>>> I've been doing some digging on adding parent pointers to xfs and wanted to
> > >>>> send a note to folks here to get peoples opinions on it.
> > >>>
> > >>> Are you talking about parent pointers in the BTrees?
> > >>>
> > >>
> > >> No, see [RFC 00/17] RFC parent inode pointers. for example, from long
> > >> ago.
> > >>
> > >> "Parent inode support allow XFS to quickly derive a file name and
> > >> path from the mount point. This can aid in directory/path policies
> > >> and can help relocate items during filesystem shrink."
> > >>
> > >> It has a long and ... difficult history.
> > >>
> > >> -Eric

Woot, time to dive in. :)

> > > Right, so to expand on Eric's answer, it looks like Dave and Brian
> > > had been working on some improvements based on that set, but it's
> > > not quite finished yet.  The idea is that we add an extended
> > > attribute to keep track of the parents inode and generation, and

Well, a lot of attributes, in the case of multiply hardlinked files.
One concern I have is that since a file can be hardlinked 2^32 times and
path components have a maximum length of 255 bytes, we'll have to be
careful about not overflowing di_anextents as a part of making a
potentially huge attribute tree.

> > > also the child entries offset, and filename. So in this solution
> > > the EA is name={parent inode #, parent inode generation, dirent
> > > offset}, value={dirent filename}.

This (the disk format) is probably the most important part to get
settled.  FWIW, I've recaptured three of the previous discussions of
parent pointers -- Lachlan McIlroy's code dump in 2009[1], Mark
Tinguely's 2013 attempt[2] and 2014 attempt[3] at discussing patches.
For those of you following along at home, the gmane link mentioned in
[3] is the link at [1].

[1] http://oss.sgi.com/archives/xfs/2009-01/msg01068.html
[2] http://oss.sgi.com/archives/xfs/2013-04/msg00214.html
[3] https://www.spinics.net/lists/xfs/msg25175.html

The latest round of parent pointer patches is (I think) in the form of
an off-list patchset that Brian (and now Allison) are cleaning up for
reposting, which hopefully happens soon because I feel rather ill at
ease discussing off-list code that lacks S-o-b's.

Anyway, the current proposal is to create a new xattr namespace
(ATTR_PARENT) and to fill it with (parent_ino, parent_gen,
dirent_offset) => (dirent name) attributes:

struct xfs_parent_name_rec {
	__be64	p_ino;
	__be32	p_gen;
	__be32	p_diroffset;
};

p_diroffset is really a directory datapointer, not a raw byte offset.

We also have an incore data structure of:

struct xfs_parent_name_irec {
	__uint64_t	p_ino;
	__uint32_t	p_gen;
	__uint32_t	p_diroffset;
	const char	*p_name;
	__uint32_t	p_namelen;
};

I think p_ino ought to be xfs_ino_t and not uint64_t, and p_diroffset
ought to be xfs_dir2_dataptr_t instead of uint32_t.

> > > My goal at the moment is just to get it compiling again and finish
> > > out some of the sub routines that maintain it.  It looks like it
> > > hasn't had much attention in a while, so I wanted to let people
> > > know the direction I'm planning to move in before I get too far
> > > in.

<nod>

> > If you're forward-porting that 17-patch set from Mark, I'd suggest
> > first reading Dave's response to it - IIRC it amounted to a firm
> > NAK.  it also highlights the complexity of this undertaking, and may
> > explain why nobody has gotten it done (yet) :)
> > 
> 
> The patches that came from me (to Allison) were last sent to me directly
> from Dave. I know he had some objections to the original design, but my
> understanding was that he had incorporated fixes for those issues in his
> modifications to Mark's original series (such as the xattr format and

>From what I can tell, he seems to have taken Mark's patches to pass dir
offsets back from the dirent manipulation functions and then started
writing a simple implementation of stuffing the attrs into the attr
tree.

> whatnot), but the series wasn't completely done yet in terms of
> supporting all possible directory operations. (It's probably a good idea
> to review that old thread anyways just to confirm, unless Dave catches
> this and is able to chime in one way or the other.)

(Or jump in the fray out once he's back from sabbatical.)

> I basically just forwarded ported Dave's patches to more recent kernels
> and added a couple minor fixes as I had combed through the existing
> code. I never really got to adding anything of substance before Allison
> volunteered to take over the series.
> 
> FWIW, I think there was some mention of porting some of the operations
> over to the deferred ops infrastructure, but it's not clear to me off
> the top of my head how important (or appropriate) that is.

Most of the users of xfs_trans_roll seem to be buried in the xattr code.
For example, _attr_set can end up rolling a transaction after converting
an attr fork from short format to leaf format before retrying the attr
add operation, but we don't use log redo items (er, defer_ops items) to
make sure log recovery will restart the attr add operation if we crash
before the final _trans_commit after calling _attr_{leaf,node}_addname.

In the case of a regular xattr set operation this wouldn't have been a
big deal because the fsetxattr call wouldn't have returned, so all the
user could possibly see is an inode with a perhaps unnecessarily large
xattr fork.  Now that we need to set xattrs in the same transaction
chain as a directory operation, it becomes very important that log
recovery gain the ability to resume an xattr add operation no matter
where the log stopped.  Creating a directory can become this longwinded
pile of updates:

- Allocate directory block, map into data fork, chain rmap update, add
  directory entry, chain pptr update;
- Add rmap entry for new directory block (more chaining to fix
  freelist), finish first rmap update;
- Allocate xattr block for short->leaf conversion, map into attr fork,
  chain rmap update; chain xattr add operation;
- Add rmap entry for new xattr block (more chaining to fix freelist),
  finish second rmap update;
- Add xattr entry to file, finish xattr add operation;
- Finish pptr update.

Basically, I think someone's going to have to go audit all the uses of
xfs_trans_roll in the attr code to figure out which operations need redo
items, and how to cram everything toegether into a single xfs_defer_ops,
rather than sprinkling them around the attr code like we do now, because
redo items cannot be deferred from one defer_ops to another.

> It's something to keep in mind, in any event. IIRC there were also
> missing Signed-off-by's required for some of Mark's original patches.

That's also a problem; unless someone can get Mark to supply them, we
probably have to get someone to rewrite them.  At a bare minimum I think
we explicitly have to pass back a xfs_dir2_dataptr_t, not a bare
uint32_t.

> IMO, the best next step might be to just finish off the implementation
> as-is such that we could have a fairly functional RFC to put on the
> list and hash out whether there are in fact any remaining design
> hurdles, but others might have a different opinion on that. :)

Agreed.

> Brian
> 
> > -Eric
> >  
> > > Allison
> > >>
> > >>>> I got in touch
> > >>>> with Brian Foster not too long ago and he had some code partially done from
> > >>>> about a year or so ago (looks like it has patches from Dave Chinner and Mark
> > >>>> Tinguely as well).  So I am hoping to be able to use what we have so far to
> > >>>> create something updated and finished out.  I am still pretty new to the xfs
> > >>>> code, so at the moment I am still just going through old discussion threads,
> > >>>> and reviewing the patches.  But for the most part I just wanted to see what
> > >>>> people thought and get everyone on board with the idea.  Suggestions and
> > >>>> feedback are much appreciated. Thank you!!
> > >>>>
> > >>>
> > >>> It will be better if you can describe in more details if you have any specific
> > >>> goal with this, and/or what kind of improvement you expect to have with it,
> > >>> adding something new without a reason, is usually not well received.
> > >>>
> > >>> Cheers
> > >>>
> > >> -- 
> > >> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > >> the body of a message to majordomo@xxxxxxxxxxxxxxx
> > >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > >>
> > > -- 
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux