On Tue, Apr 05, 2022 at 03:24:51PM +0530, Ritesh Harjani wrote: > On 22/04/02 11:40AM, anserper@xxxxx wrote: > > From: Andrew Perepechko <andrew.perepechko@xxxxxxx> > > > > When changing a large xattr value to a different large xattr value, > > the old xattr inode is freed. Truncate during the final iput causes > > current transaction restart. Eventually, parent inode bh is marked > > dirty and kernel panic happens when jbd2 figures out that this bh > > belongs to the committed transaction. > > > > A possible fix is to call this final iput in a separate thread. > > This way, setxattr transactions will never be split into two. > > Since the setxattr code adds xattr inodes with nlink=0 into the > > orphan list, old xattr inodes will be properly cleaned up in > > any case. > > Ok, I think there is a lot happening in above description. I think part of the > problem I am unable to understand it easily is because I haven't spend much time > with xattr code. But I think below 2 requests will be good to have - > > 1. Do we have the call stack for this problem handy. I think it will be good to > mention it in the commit message itself. It is sometimes easy to look at the > call stack if someone else encounters a similar problem. That also gives more > idea about where the problem is occuring. > > 2. Do we have a easy reproducer for this problem? I think it will be a good > addition to fstests given that this adds another context in calling iput on > old_ea_inode. Andrew, would it be possible for you to supply a call stack and a reproducer? It sounds like what's going on is if the file system has the ea_inode feature enabled, and we have a large xattr value which is stored in an inode, it's possible if that when that inode is truncated, it is spread across two transactions. But the problem is that when the iput(ea_inode) is called from ext4_xattr_set_entry(), there is a handle which is passed into that function, since the xattr operation is part of its own transaction, and so the truncate operation is part of "nested handle". That's OK, so long as the initial handle is started with sufficient credits for the nested start_handle. But when that handle is closed, and then re-opened, it has two problems. The first is that the xattr operation is no longer atomic (and spread across two transaction). The second is that if the write access to the inode table's bh was requested before the implied truncate from iput(ea_inode), then when we call handle_dirty_metadata() on that bh, we get a jbd2 assertion. (Which is good, because it notifies and catches the first problem.) So by moving the iput to a separate thread, it avoids this problem, since the truncate can take place in its own handle. The other solution would be to some how pass the inode all the way up through the call chain, and only call iput(ea_inode) after handle is stopped. But that would require quite a lot of code surgery, since ext4_xattr_set_entry is called in a number of places, and the iput() would have to be plumbed up through two callers to where the handle is actually stopped. - Ted