On Fri, 22 Sept 2023 at 23:36, Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > Apparently, they are willing to handle the "year 2486" issue ;) Well, we could certainly do the same at the VFS layer. But I suspect 10ns resolution is entirely overkill, since on a lot of platforms you don't even have timers with that resolution. I feel like 100ns is a much more reasonable resolution, and is quite close to a single system call (think "one thousand cycles at 10GHz"). > But the resolution change is counter to the purpose of multigrain > timestamps - if two syscalls updated the same or two different inodes > within a 100ns tick, apparently, there are some workloads that > care to know about it and fs needs to store this information persistently. Those workloads are broken garbage, and we should *not* use that kind of sh*t to decide on VFS internals. Honestly, if the main reason for the multigrain resolution is something like that, I think we should forget about MG *entirely*. Somebody needs to be told to get their act together. We have *never* guaranteed nanosecond resolution on timestamps, and I think we should put our foot down and say that we never will. Partly because we have platforms where that kind of timer resolution just does not exist. Partly because it's stupid to expect that kind of resolution anyway. And partly because any load that assumes that kind of resolution is already broken. End result: we should ABSOLUTELY NOT have as a target to support some insane resolution. 100ns resolution for file access times is - and I'll happily go down in history for saying this - enough for anybody. If you need finer resolution than that, you'd better do it yourself in user space. And no, this is not a "but some day we'll have terahertz CPU's and 100ns is an eternity". Moore's law is dead, we're not going to see terahertz CPUs, and people who say "but quantum" have bought into a technological fairytale. 100ns is plenty, and has the advantage of having a very safe range. That said, we don't have to do powers-of-ten. In fact, in many ways, it would probably be a good idea to think of the fractional seconds in powers of two. That tends to make it cheaper to do conversions, without having to do a full 64-bit divide (a constant divide turns into a fancy multiply, but it's still painful on 32-bit architectures). So, for example, we could easily make the format be a fixed-point format with "sign bit, 38 bit seconds, 25 bit fractional seconds", which gives us about 30ns resolution, and a range of almost 9000 years. Which is nice, in how it covers all of written history and all four-digit years (we'd keep the 1970 base). And 30ns resolution really *is* pretty much the limit of a single system call. I could *wish* we had system calls that fast, or CPU's that fast. Not the case right now, and sadly doesn't seem to be the case in the forseeable future - if ever - either. It would be a really good problem to have. And the nice thing about that would be that conversion to timespec64 would be fairly straightforward: struct timespec64 to_timespec(fstime_t fstime) { struct timespec64 res; unsigned int frac; frac = fstime & 0x1ffffffu; res.tv_sec = fstime >> 25; res.tv_nsec = frac * 1000000000ull >> 25; return res; } fstime_t to_fstime(struct timespec64 a) { fstime_t sec = (fstime_t) a.tv_sec << 25; unsigned frac = a.tv_nsec; frac = ((unsigned long long) a.tv_nsec << 25) / 1000000000ull; return sec | frac; } and both of those generate good code (that large divide by a constant in to_fstime() is not great, but the compiler can turn it into a multiply). The above could be improved upon (nicer rounding and overflow handling, and a few modifications to generate even nicer code), but it's not horrendous as-is. On x86-64, to_timespec becomes a very reasonable movq %rdi, %rax andl $33554431, %edi imulq $1000000000, %rdi, %rdx sarq $25, %rax shrq $25, %rdx and to some degree that's the critical function (that code would show up in 'stat()'). Of course, I might have screwed up the above conversion functions, they are untested garbage, but they look close enough to being in the right ballpark. Anyway, we really need to push back at any crazies who say "I want nanosecond resolution, because I'm special and my mother said so". Linus