Re: Symlink not persisted even after fsync

Vijay Chidambaram <vijay@xxxxxxxxxxxxx> · Sun, 15 Apr 2018 19:10:52 -0500

Hi Ted,

On Sun, Apr 15, 2018 at 9:13 AM, Theodore Y. Ts'o <tytso@xxxxxxx> wrote:
> On Sat, Apr 14, 2018 at 08:35:45PM -0500, Vijaychidambaram Velayudhan Pillai wrote:
>> I was one of the authors on that paper, and I didn't know until today you
>> didn't like that work :) The paper did *not* suggest we support invented
>> guarantees without considering the performance impact.
>
> I hadn't noticed that you were one of the authors on that paper,
> actually.
>
> The problem with that paper was I don't think the researchers had
> talked to anyone who had actually designed production file systems.
> For example, there are some the hypothetical ext3-fast file system
> proposed in the paper has some real practical problems.  You can't
> just switch between having the file contents being journaled via the
> data=journal mode, and file contents being written via the normal page
> cache mechanisms.  If you don't do some very heavy-weight, performance
> killing special measures, data corruption is a very real possibility.

I don't think this is what the paper's ext3-fast does. All the paper
says is if you have a file system where the fsync of a file persisted
only data related to that file, it would increase performance.
ext3-fast is the name given to such a file system. Note that we do not
present a design of ext3-fast or analyze it in any detail. In fact, we
explicitly say "The ext3-fast file system (derived from inferences
provided by ALICE) seems interesting for application safety, though
further investigation is required into the validity of its design."

> I agree that documenting what behavior applications can depend upon is
> useful.  However, this needs to be done as a conversation --- and a
> negotiation --- between application and file system developers.  (And
> not necessarily just from one operating system, either!  Application
> authors might care about whether they can get robustness guarantees on
> other operationg systems, such as Mac OS X.)  Also, the tradeoffs may
> in some cases probabilities of data loss, and not hard guarantees.
>
> Formal documentation also takes a lot of effort to write.  That's
> probably why no one has tried to formally codify it since POSIX.  We
> do have informal agreements, such as adding an implied data flush
> after certain close or renames operations.  And sometimes these are
> written up, but only informally.  A good example of this is the
> O_PONIES controversy, wherein the negotiations/conversation happened
> on various blog entries, and ultimately at an LSF/MM face-to-face
> meeting:
>
>         http://blahg.josefsipek.net/?p=364
>         https://sandeen.net/wordpress/uncategorized/coming-clean-on-o_ponies/
>         https://lwn.net/Articles/322823/
>         https://lwn.net/Articles/327601/
>         https://lwn.net/Articles/351422/
>
> Note that the implied file writebacks after certain renames and closes
> (as documented at the end of https://lwn.net/Articles/322823/) was
> implemented for ext4, and then after discussion at LSF/MM, there was
> general agreement across multiple major file system maintainers that
> we should all provide similar behavior.
>
> So doing this kind of standardization, especially if you want to take
> into account all of the stakeholders, takes time and is not easy.  If
> you only take one point of view, you can have what happened with the C
> standard, where the room was packed with compiler authors, who were
> only interested in what kind of cool compiler optimizations they could
> do, and completely ignored whether the resulting standard would
> actually be useful by practicing system programmers.  Which is why the
> Linux kernel is only really supported on gcc, and then with certain
> optimizations allowed by the C standard explicitly turned off.  (Clang
> support is almost there, but not everyone trust a kernel built by
> Clang won't have some subtle, hard-to-debug problems...)

I definitely agree it takes time and effort. I'm hoping our work on
CrashMonkey can help here, by codifying the crash-consistency
guarantees into tests that new file-system developers can use.

>
> Academics could very well have a place in helping to facilitate the
> conversation.  I think my primary concern with the Pillai paper is
> that the authors apparently talked a whole bunch to application
> authors, but not nearly as much to file system developers.

I agree with this criticism. This is why my research group engages
with the file-system community right from project start, as we have
been doing with CrashMonkey.

>> But in any case, coming back to our main question, the conclusion seems to
>> be: symlinks aren't standard, so we shouldn't be studying their
>> crash-consistency properties. This is useful to know. Thanks!
>
> Well, symlinks are standardized.  But what the standards say about
> them is extremely limited.  And the crash-consistency properties you
> were looking at, which is what fsync() being called on a file
> descriptor opened via a symlink, is definitely not consistent with
> either the Posix/SUS standard, or historical practice by BSD and other
> Unix systems, as well as Linux.

Thanks! As I mentioned before, this is useful. I have a follow-up
question. Consider the following workload:

 creat foo
 link (foo, A/bar)
 fsync(foo)
 crash

In this case, after the file system recovers, do we expect foo's link
count to be 2 or 1? I would say 2, but POSIX is silent on this, so
thought I would confirm. The tricky part here is we are not calling
fsync() on directory A.

In this case, its not a symlink; its a hard link, so I would say the
link count for foo should be 2. But btrfs and F2FS show link count of
1 after a crash.

Thanks,
Vijay Chidambaram
http://www.cs.utexas.edu/~vijay/
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html