On Tue, May 10, 2022 at 04:53:47PM -0700, Darrick J. Wong wrote: > On Mon, May 09, 2022 at 10:41:37AM +1000, Dave Chinner wrote: > > From: Dave Chinner <dchinner@xxxxxxxxxx> > > > > We can't use the same algorithm for replacing an existing attribute > > when logging attributes. The existing algorithm is essentially: > > > > 1. create new attr w/ INCOMPLETE > > 2. atomically flip INCOMPLETE flags between old + new attribute > > 3. remove old attr which is marked w/ INCOMPLETE > > > > This algorithm guarantees that we see either the old or new > > attribute, and if we fail after the atomic flag flip, we don't have > > to recover the removal of the old attr because we never see > > INCOMPLETE attributes in lookups. > > > > For logged attributes, however, this does not work. The logged > > attribute intents do not track the work that has been done as the > > transaction rolls, and hence the only recovery mechanism we have is > > "run the replace operation from scratch". > > > > This is further exacerbated by the attempt to avoid needing the > > INCOMPLETE flag to create an atomic swap. This means we can create > > a second active attribute of the same name before we remove the > > original. If we fail at any point after the create but before the > > removal has completed, we end up with duplicate attributes in > > the attr btree and recovery only tries to replace one of them. > > > > There are several other failure modes where we can leave partially > > allocated remote attributes that expose stale data, partially free > > remote attributes that enable UAF based stale data exposure, etc. > > > > TO fix this, we need a different algorithm for replace operations > > when LARP is enabled. Luckily, it's not that complex if we take the > > right first step. That is, the first thing we log is the attri > > intent with the new name/value pair and mark the old attr as > > INCOMPLETE in the same transaction. > > > > From there, we then remove the old attr and keep relogging the > > new name/value in the intent, such that we always know that we have > > to create the new attr in recovery. Once the old attr is removed, > > we then run a normal ATTR_CREATE operation relogging the intent as > > we go. If the new attr is local, then it gets created in a single > > atomic transaction that also logs the final intent done. If the new > > attr is remote, the we set INCOMPLETE on the new attr while we > > allocate and set the remote value, and then we clear the INCOMPLETE > > flag at in the last transaction taht logs the final intent done. > > > > If we fail at any point in this algorithm, log recovery will always > > see the same state on disk: the new name/value in the intent, and > > either an INCOMPLETE attr or no attr in the attr btree. If we find > > an INCOMPLETE attr, we run the full replace starting with removing > > the INCOMPLETE attr. If we don't find it, then we simply create the > > new attr. > > ...assuming that the INCOMPLETE attr is preserved until we're completely > done unmapping the rmtval blocks, right? Yes, it is preserved - the INCOMPLETE flag is held in the name entry and that's not removed until after the rmtval blocks have been fully removed. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx