Re: [ANNOUNCE] xfs-linux: for-next updated to 0cbf8c9

Eric Sandeen <sandeen@xxxxxxxxxx> · Mon, 3 Apr 2017 13:56:37 -0500

On 4/3/17 1:39 PM, Darrick J. Wong wrote:
> On Sun, Apr 02, 2017 at 10:02:14AM +1000, Dave Chinner wrote:
>> On Fri, Mar 31, 2017 at 10:06:01AM -0700, Darrick J. Wong wrote:
>>> On Fri, Mar 31, 2017 at 09:04:38AM -0700, Christoph Hellwig wrote:
>>>> On Thu, Mar 30, 2017 at 10:37:24AM -0700, Darrick J. Wong wrote:
>>>>> has just been updated.  I intend to try to put in Eric Sandeen's patches
>>>>> to perform unlinked inode cleanup during ro mount and my own GETFSMAP
>>>>> patches for 4.12, so I put them in along with the other fixes and
>>>>> cleanups to get further testing.
>>>>
>>>> Any chance we could not use for-next for stuff that's just queued up
>>>> for testing?
>>>
>>> I've had a difficult time figuring out the timeline for Eric's patches.
>>>
>>> I've been testing them internally since they were posted and haven't
>>> seen any problems crop up.  There's already an xfstest to reproduce the
>>> problem and exercise the fix. 
>>
>> Sure, but that doesn't mean that we should use for-next for testing
>> whether a change we are unsure about will have an adverse impact on
>> users.  for-next is purely for integration testing of reviewed,
>> committed patches ready for upstream merge. It feeds directly to
>> linux-next users, so it is most definitely not the place to "try
>> out" patches we aren't sure about.
> 
> All right, I'll pull them from for-next.  Eric (who I think was on
> vacation last week) is ok with this, I think.

It's been broken forever.  A little while longer won't really hurt,
I guess.

>>> Dave is concerned about the repercussions
>>> of the fs suddenly reaping up to several years' worth of orphaned inodes
>>> during a ro mount, since XFS never used to do that.  It's hard to know
>>> just how many people across the entire userbase that have read-only
>>> filesystems with inodes that have been sitting orphaned for years.
>>
>> If we are unsure of the implications of a change, then it hasn't
>> passed review, right?
>>
>> That is one of the reasons I sometimes took a month or two before I
>> would move a patch from my local testing tree to for-next (i.e. after
>> hundreds of local xfstests cycles plus all the other ad-hoc testing
>> and stress I'd run in day-to-day dev work).  i.e.  sometimes it
>> takes that long to really understand all the implications on the
>> code and to run it through a gamet of testing to ensure nothing has
>> been missed.
>>
>> Sometimes we can't be 100% certain everything is OK. In this sort of
>> situation the situation whether to merge or not comes down to risk
>> categorisation and mitigation, I'd be doing things like thinking
>> back to the number of times I've run xfs_repair and had it clean up
>> the unlinked inode list. How often did that happen without zeroing
>> the log? How many times had I seen this in user bug reports? How
>> many broken filesystem images have I done forensic examination on
>> and found orphan indoes in them?
> 
> I barely have any data to go on -- so far I've not personally seen this
> happen... but nearly all the XFS setups I know about use XFS for data
> storage (i.e. not root fs) and are always mounted rw.
> 
> The initial complaint (I think) came from a RH bug about this situation,
> so I'm assuming that the RHers have a better view on this than I do...
> IOWs, since we're spreading out some of the responsibility for owning
> pieces of code to take pressure off the maintainer, it would help me to
> have code authors and reviewers discuss the timeline in which they think
> a given patchset should be upstreamed.  This doesn't have to be
> particularly precise or set in stone -- even a hint such as "fix this
> now", "next merge window", "code looks ok but let's soak this for a few
> months" would help immensely.

Yes, we had one report.  Then we saw a guy with a /huge/ swath of space
missing on IRC, and it was the same problem.

> (Or to put it another way: I prefer that the upstreaming timeline be a
> part of the patch review from the start, rather than me trying to figure
> it out in a vacuum and walking things back when everyone is surprised.)
> 
>> This sort of categorisation gives a rough indication of how conerned
>> we should be that behavioural changes will result on something
>> different happening. The reason I raised this concern is that I've
>> seen such evidence of orphaned inodes in the past, hence it's
>> something we need to think carefully about. The next question to be
>> answered is "what is the risk of cleaning up these inodes?"
>>
>> testing will give us some insight, but we have to weigh off other
>> things such as inodes (and associated metadata) that have not been
>> touched for years. That brings into question things like storage
>> bitrot, and that's not something we can directly test for and is not
>> somethign we can be 100% certain about. We can make an educated
>> guess about how frequent such events are likely to be, however, and
>> hence get some idea of the risk we are exposing users to.
>>
>> A simple risk mitigation strategy in this case would be to say
>> "let's just enable it for v5 filesystems right now" because there
>> are much fewer of those out there, and they are much less likey to
>> have years of stale orphaned inodes on them or to be on storage old
>> enough to be bit-rotting. And even if it is bitrotted, we'll get
>> decent error reporting if there is a problem cleaning them up,
>> too.

Eh, we now have verifiers even w/o V5, right.

>> This will tell us if there is a mechanism problem in adding the
>> new behaviour, leaving the only unknown at that point the "untouched
>> metadata" risk. There's a chance we'll never see this, so once we
>> know the mechanism is fine on v5 filesystems (say 6 months after
>> kernel release) we can enable it on v4 filesystems. Then if problems
>> crop up, we have a fair idea of whether it is a mechanism or bitrot
>> problem that is causing recovery failures....
> 
> Ok.  See, this was what I was looking for, in terms of 'what is someone
> uncomfortable about and what would they like us to do about it'.  Eric?

Well, tbh this all seems a bit circular and hand-wavy.

We're doing half of recovery and not the other half in the case where
we have an RO mount.  And Dave has voiced some rather vague worries
about fixing it to do /all/ of recovery on an ro mount.

I've written a test explicitly to exercise this, so we do have a functional
regression test.  But we can't merge it because we're afraid it might
break something in the field, and we won't know if it will break anything
in the field until we merge it.

I mean, I guess we could enable for v5 and not v4, but I'm really not
understanding why there's so much fear around this particular change.
There seems to be an order of magnitude more worry than for most other
changes, and I don't know why that is.

Of course, as soon as I argue for earlier and universal inclusion, it'll
blow up in my face, because that's just how these things go.  ;)

>> This is why I keep saying "there should be no rush to commit and
>> push upstream". This stuff is complex, it takes time to test
>> sufficiently and history tells us that the worst thing we can do is
>> push stuff to users too quickly.  There is nothing wrong with saying
>> "I'm not sure of this yet, lets slip it to the next cycle" and
>> continue to test and evaluate it.

Well, if it's just a release timing issue, that's different.

If the proposal is to merge it early next cycle and (in theory) get more
widespread testing before that kernel is baked, that's fine by me.

Dave, if you have any specific worries or things that you want a testcase
for, I'm all ears.  If it's vague fears, I'm not sure how to remedy that.

One thing I could do is fabricate a test that processes the unlinked list
on a filesystem that has a /clean/ log.  Or, if we're worried that there's
state built up around log replay that matters, I could changes so that this
processing only happens if we have run into a dirty log on this mount, and
have done the normal log replay that we normally see with unlinked inodes ...

-Eric

> In the meantime, what do people think of this for for-next tomorrow?
> https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git/log/?h=xfs-4.12-merge
> 
> --D
> 
>> Cheers,
>>
>> Dave.
>> -- 
>> Dave Chinner
>> david@xxxxxxxxxxxxx
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html