On Mon, Apr 16, 2018 at 12:52 AM, Theodore Y. Ts'o <tytso@xxxxxxx> wrote: > On Sun, Apr 15, 2018 at 07:10:52PM -0500, Vijay Chidambaram wrote: >> >> I don't think this is what the paper's ext3-fast does. All the paper >> says is if you have a file system where the fsync of a file persisted >> only data related to that file, it would increase performance. >> ext3-fast is the name given to such a file system. Note that we do not >> present a design of ext3-fast or analyze it in any detail. In fact, we >> explicitly say "The ext3-fast file system (derived from inferences >> provided by ALICE) seems interesting for application safety, though >> further investigation is required into the validity of its design." > > Well, says that it's based on ext3's data=journal "Abstract Persistent > Model". It's true that a design was not proposed --- but if you > don't propose a design, how do you know what the performance is or > whether it's even practical? That's one of those things I find > extremely distasteful in the paper. Sure, I can model a faster than > light interstellar engine ala Star Trek's Warp Drive --- and I can > talk about it having, say, better performance than a reaction drive. > But it doesn't tell us anything useful about whether it can be built, > or whether it's even useful to dream about it. > > To me, that part of the paper, really read as, "watch as I wave my > hands around widely, that they never leave the ends of my arms!" I partially understand where you are coming from, but your argument seems to boil down to "don't say anything until you have worked out every detail". I don't agree with this. Yes, it was speculative, but we did have a fairly clear disclaimer. To the point about it being obvious: you might be surprised at how many people outside this community take it for granted that if you fsync a file, only that file's contents and metadata will be persisted :) So it was obvious to you, but truly shocking for many. Btw, ext3-fast is what led to our CCFS work in FAST 17: http://www.cs.utexas.edu/~vijay/papers/fast17-c2fs.pdf. In this paper, we do show that if you divide your application writes into streams, it is possible to persist only the data/metadata of one stream, independent of the IO being done in other streams. So as it turned out, it wasn't an impossible file-system design. But we digress. I think we both agree that researchers should engage more with the file-system community. > >> Thanks! As I mentioned before, this is useful. I have a follow-up >> question. Consider the following workload: >> >> creat foo >> link (foo, A/bar) >> fsync(foo) >> crash >> >> In this case, after the file system recovers, do we expect foo's link >> count to be 2 or 1? I would say 2, but POSIX is silent on this, so >> thought I would confirm. The tricky part here is we are not calling >> fsync() on directory A. >> >> In this case, its not a symlink; its a hard link, so I would say the >> link count for foo should be 2. But btrfs and F2FS show link count of >> 1 after a crash. > > Well, is the link count accurate? That is to say, does A/bar exist? > I would think that the requirement that the file system be self > consistent is the most important consideration. There are two ways to look at this. 1. A/bar does not exist, link count is 1, and so it is not a bug. 2. We are calling fsync on the inode when the inode's link count is 2. So it should persist the inode plus the dependency that is A/bar. The file system after a crash should show both A/bar and the file with link count 2. This is what ext4, xfs, and F2FS do. We've posted separately to figure out what semantics btrfs supports. -- To unsubscribe from this list: send the line "unsubscribe fstests" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html