On Wed, Aug 19, 2020 at 09:35:35AM +1000, Dave Chinner wrote: > On Mon, Aug 17, 2020 at 03:57:39PM -0700, Darrick J. Wong wrote: > > From: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > > > Redesign the ondisk timestamps to be a simple unsigned 64-bit counter of > > nanoseconds since 14 Dec 1901 (i.e. the minimum time in the 32-bit unix > > time epoch). This enables us to handle dates up to 2486, which solves > > the y2038 problem. > > > > Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > --- > ..... > > +/* Convert an ondisk timestamp into the 64-bit safe incore format. */ > > void > > xfs_inode_from_disk_timestamp( > > + struct xfs_dinode *dip, > > struct timespec64 *tv, > > const union xfs_timestamp *ts) > > { > > + if (dip->di_version >= 3 && > > + (dip->di_flags2 & cpu_to_be64(XFS_DIFLAG2_BIGTIME))) { > > + uint64_t t = be64_to_cpu(ts->t_bigtime); > > + uint64_t s; > > + uint32_t n; > > + > > + s = div_u64_rem(t, NSEC_PER_SEC, &n); > > + tv->tv_sec = s - XFS_INO_BIGTIME_EPOCH; > > + tv->tv_nsec = n; > > + return; > > + } > > + > > tv->tv_sec = (int)be32_to_cpu(ts->t_sec); > > tv->tv_nsec = (int)be32_to_cpu(ts->t_nsec); > > } > > Can't say I'm sold on this union. It seems cleaner to me to just > make the timestamp an opaque 64 bit field on disk and convert it to > the in-memory representation directly in the to/from disk > operations. e.g.: > > void > xfs_inode_from_disk_timestamp( > struct xfs_dinode *dip, > struct timespec64 *tv, > __be64 ts) > { > > uint64_t t = be64_to_cpu(ts); > uint64_t s; > uint32_t n; > > if (xfs_dinode_is_bigtime(dip)) { > s = div_u64_rem(t, NSEC_PER_SEC, &n) - XFS_INO_BIGTIME_EPOCH; > } else { > s = (int)(t >> 32); > n = (int)(t & 0xffffffff); > } > tv->tv_sec = s; > tv->tv_nsec = n; > } I don't like this open-coded union approach at all because now I have to keep the t_sec and t_nsec bits separate in my head instead of letting the C compiler take care of that detail. The sample code above doesn't handle that correctly either: Start with an old kernel on a little endian system; each uppercase letter represents a byte (A is the LSB of t_sec, D is the MSB of t_sec, E is the LSB of t_nsec, and H is the MSB of t_nsec): sec nsec (incore) ABCD EFGH That gets written out as: sec nsec (ondisk) DCBA HGFE Now reboot with a new kernel that only knows 64bit timestamps on disk: 64bit (ondisk) DCBAHGFE Now it does the first be64_to_cpu conversion: 64bit (incore) EFGHABCD And then masks and shifts: sec nsec (incore) EFGH ABCD Oops, we just switched the values! The correct approach (I think) is to perform the shifting and masking on the raw __be64 value before converting them to incore format via be32_to_cpu, but now I have to work out all four cases by hand instead of letting the compiler do the legwork for me. I don't remember if it's correct to go around shifting and masking __be64 values. I guess the good news is that at least we have generic/402 to catch these kinds of persistence problems, but ugh. Anyway, what are you afraid of? The C compiler smoking crack and not actually overlapping the two union elements? We could control for that... > > @@ -220,9 +234,9 @@ xfs_inode_from_disk( > > * a time before epoch is converted to a time long after epoch > > * on 64 bit systems. > > */ > > - xfs_inode_from_disk_timestamp(&inode->i_atime, &from->di_atime); > > - xfs_inode_from_disk_timestamp(&inode->i_mtime, &from->di_mtime); > > - xfs_inode_from_disk_timestamp(&inode->i_ctime, &from->di_ctime); > > + xfs_inode_from_disk_timestamp(from, &inode->i_atime, &from->di_atime); > > + xfs_inode_from_disk_timestamp(from, &inode->i_mtime, &from->di_mtime); > > + xfs_inode_from_disk_timestamp(from, &inode->i_ctime, &from->di_ctime); > > > > to->di_size = be64_to_cpu(from->di_size); > > to->di_nblocks = be64_to_cpu(from->di_nblocks); > > @@ -235,9 +249,17 @@ xfs_inode_from_disk( > > if (xfs_sb_version_has_v3inode(&ip->i_mount->m_sb)) { > > inode_set_iversion_queried(inode, > > be64_to_cpu(from->di_changecount)); > > - xfs_inode_from_disk_timestamp(&to->di_crtime, &from->di_crtime); > > + xfs_inode_from_disk_timestamp(from, &to->di_crtime, > > + &from->di_crtime); > > to->di_flags2 = be64_to_cpu(from->di_flags2); > > to->di_cowextsize = be32_to_cpu(from->di_cowextsize); > > + /* > > + * Set the bigtime flag incore so that we automatically convert > > + * this inode's ondisk timestamps to bigtime format the next > > + * time we write the inode core to disk. > > + */ > > + if (xfs_sb_version_hasbigtime(&ip->i_mount->m_sb)) > > + to->di_flags2 |= XFS_DIFLAG2_BIGTIME; > > } > > We do not want on-disk flags to be changed outside transactions like > this. Indeed, this has implications for O_DSYNC operation, in that > we do not trigger inode sync operations if the inode is only > timestamp dirty. If we've changed this flag, then the inode is more > than "timestamp dirty" and O_DSYNC will need to flush the entire > inode.... :/ I forgot about XFS_ILOG_TIMESTAMP. > IOWs, I think we should only change this flag in a timestamp > transaction where the timestamps are actually being logged and hence > we can set inode dirty state appropriately so that everything will > get logged, changed and written back correctly.... Yeah, that's fair. I'll change xfs_trans_log_inode to set the bigtime flag if we're logging either the timestamps or the core. --D > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx