Re: Symlink not persisted even after fsync

Vijay Chidambaram <vijay@xxxxxxxxxxxxx> · Mon, 16 Apr 2018 10:09:53 -0500

On Mon, Apr 16, 2018 at 12:52 AM, Theodore Y. Ts'o <tytso@xxxxxxx> wrote:
> On Sun, Apr 15, 2018 at 07:10:52PM -0500, Vijay Chidambaram wrote:
>>
>> I don't think this is what the paper's ext3-fast does. All the paper
>> says is if you have a file system where the fsync of a file persisted
>> only data related to that file, it would increase performance.
>> ext3-fast is the name given to such a file system. Note that we do not
>> present a design of ext3-fast or analyze it in any detail. In fact, we
>> explicitly say "The ext3-fast file system (derived from inferences
>> provided by ALICE) seems interesting for application safety, though
>> further investigation is required into the validity of its design."
>
> Well, says that it's based on ext3's data=journal "Abstract Persistent
> Model".  It's true that a design was not proposed --- but if you
> don't propose a design, how do you know what the performance is or
> whether it's even practical?  That's one of those things I find
> extremely distasteful in the paper.  Sure, I can model a faster than
> light interstellar engine ala Star Trek's Warp Drive --- and I can
> talk about it having, say, better performance than a reaction drive.
> But it doesn't tell us anything useful about whether it can be built,
> or whether it's even useful to dream about it.
>
> To me, that part of the paper, really read as, "watch as I wave my
> hands around widely, that they never leave the ends of my arms!"

I partially understand where you are coming from, but your argument
seems to boil down to "don't say anything until you have worked out
every detail". I don't agree with this. Yes, it was speculative, but
we did have a fairly clear disclaimer.

To the point about it being obvious: you might be surprised at how
many people outside this community take it for granted that if you
fsync a file, only that file's contents and metadata will be persisted
:) So it was obvious to you, but truly shocking for many.

Btw, ext3-fast is what led to our CCFS work in FAST 17:
http://www.cs.utexas.edu/~vijay/papers/fast17-c2fs.pdf. In this paper,
we do show that if you divide your application writes into streams, it
is possible to persist only the data/metadata of one stream,
independent of the IO being done in other streams. So as it turned
out, it wasn't an impossible file-system design.

But we digress. I think we both agree that researchers should engage
more with the file-system community.

>
>> Thanks! As I mentioned before, this is useful. I have a follow-up
>> question. Consider the following workload:
>>
>>  creat foo
>>  link (foo, A/bar)
>>  fsync(foo)
>>  crash
>>
>> In this case, after the file system recovers, do we expect foo's link
>> count to be 2 or 1? I would say 2, but POSIX is silent on this, so
>> thought I would confirm. The tricky part here is we are not calling
>> fsync() on directory A.
>>
>> In this case, its not a symlink; its a hard link, so I would say the
>> link count for foo should be 2. But btrfs and F2FS show link count of
>> 1 after a crash.
>
> Well, is the link count accurate?  That is to say, does A/bar exist?
> I would think that the requirement that the file system be self
> consistent is the most important consideration.

There are two ways to look at this.

1. A/bar does not exist, link count is 1, and so it is not a bug.

2. We are calling fsync on the inode when the inode's link count is 2.
So it should persist the inode plus the dependency that is A/bar. The
file system after a crash should show both A/bar and the file with
link count 2. This is what ext4, xfs, and F2FS do.

We've posted separately to figure out what semantics btrfs supports.
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html