On Sat, Aug 3, 2019 at 6:03 PM Theodore Y. Ts'o <tytso@xxxxxxx> wrote: > > On Sat, Aug 03, 2019 at 11:30:22AM +0200, Arnd Bergmann wrote: > > > > I see in the ext4 code that we always try to expand i_extra_size > > to s_want_extra_isize in ext4_mark_inode_dirty(), and that > > s_want_extra_isize is always at least s_min_extra_isize, so > > we constantly try to expand the inode to fit. > > Yes, we *try*. But we may not succeed. There may actually be a > problem here if the cause is due to there simply is no space in the > external xattr block, so we might try and try every time we try to > modify that inode, and it would be a performance mess. If it's due to > there being no room in the current transaction, then it's highly > likely it will succeed the next time. > > > Did older versions of ext4 or ext3 ignore s_min_extra_isize > > when creating inodes despite > > EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE, > > or is there another possibility I'm missing? > > s_min_extra_isize could get changed in order to make room for some new > file system feature --- such as extended timestamps. Ok, that explains it. I assumed s_min_extra_isize was meant to not be modifiable, and did not find a way to change it using the kernel or tune2fs, but now I can see that debugfs can set it. > If you want to pretend that file systems never get upgraded, then life > is much simpler. The general approach is that for less-sophisticated > customers (e.g., most people running enterprise distros) file system > upgrades are not a thing. But for sophisticated users, we do try to > make thing work for people who are aware of the risks / caveats / > rough edges. Google won't have been able to upgrade thousands and > thousands of servers in data centers all over the world if we limited > ourselves to Red Hat's support restrictions. Backup / reformat / > restore really isn't a practical rollout strategy for many exabytes of > file systems. > > It sounds like your safety checks / warnings are mostly targeted at > low-information customers, no? Yes, that seems like a reasonable compromise: just warn based on s_min_extra_isize, and assume that anyone who used debugfs to set s_min_extra_isize to a higher value from an ext3 file system during the migration to ext4 was aware of the risks already. That leaves the question of what we should set the s_time_gran and s_time_max to on a superblock with s_min_extra_isize<16 and s_want_extra_isize>=16. If we base it on s_min_extra_isize, we never try to set a timestamp later than 2038 and so will never fail, but anyone with a grandfathered s_min_extra_isize from ext3 won't be able to set extended timestamps on any files any more. Based on s_want_extra_isize we would keep the current behavior, but could add a custom warning in the ext4 code about the small s_min_extra_isize indicating a theoretical problem. Arnd