On Thu, 2023-07-20 at 15:37 +0000, Chuck Lever III wrote: > > > On Jul 20, 2023, at 11:33 AM, Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > > > On Thu, 2023-07-20 at 15:15 +0000, Chuck Lever III wrote: > > > > > > > On Jul 20, 2023, at 10:59 AM, Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > > > > > > > At one time, nfsd would scrape inode information directly out of struct > > > > inode in order to populate the change_info4. At that time, the BUG_ON in > > > > set_change_info made some sense, since having it unset meant a coding > > > > error. > > > > > > > > More recently, it calls vfs_getattr to get this information, which can > > > > fail. If that fails, fh_pre_saved can end up not being set. While this > > > > situation is unfortunate, we don't need to crash the box. > > > > > > I'm always happy to get rid of a BUG_ON(). But I'm not sure even > > > a warning is necessary in this case. It's not likely that it's > > > a software bug or something that the server administrator can > > > do something about. > > > > > > Can you elaborate on why the vfs_getattr() might fail? Eg, how > > > was it failing in 2223560 ? > > > > > > > I'm fine with dropping the WARN_ON. You are correct that there is > > probably little the admin can do about it. > > > > vfs_getattr can fail for all sorts of reasons. It really depends on the > > underlying filesystem. In 2223560, I don't know for sure, but just prior > > to the oops, there were these messages in the log: > > > > [51935.482019] XFS (vda3): Filesystem has been shut down due to log error (0x2). > > [51935.482020] XFS (vda3): Please unmount the filesystem and rectify the problem(s). > > [51935.482550] vda3: writeback error on inode 25320400, offset 2097152, sector 58684120 > > > > My assumption was that the fs being shut down caused some VFS operations > > to start returning errors (including getattr) and that is why > > fh_pre_saved ultimately didn't get set. > > I'm wondering if the operation should just fail in this case > rather than return a cobbled-up changeinfo4. Maybe for another > day. > Actually, this doesn't look too hard to do. We should be able to just unwind and return an error in all cases if collecting pre_op_attrs fails. The trickier bit is what to do if collecting post_op_attrs fails after collecting pre-op attrs and the operation itself succeeded. What should go into the after_change value? 0? Should we just copy the before_change value? -- Jeff Layton <jlayton@xxxxxxxxxx>