Re: [RFC PATCH 19/24] erofs: introduce namei alternative to C

Gao Xiang <hsiangkao@xxxxxxxxxxxxxxxxx> · Tue, 17 Sep 2024 15:14:58 +0800

On 2024/9/17 14:48, Yiyang Wu wrote:
On Mon, Sep 16, 2024 at 06:08:01PM GMT, Al Viro wrote:
On Mon, Sep 16, 2024 at 09:56:29PM +0800, Yiyang Wu wrote:
+/// Lookup function for dentry-inode lookup replacement.
+#[no_mangle]
+pub unsafe extern "C" fn erofs_lookup_rust(
+    k_inode: NonNull<inode>,
+    dentry: NonNull<dentry>,
+    _flags: c_uint,
+) -> *mut c_void {
+    // SAFETY: We are sure that the inode is a Kernel Inode since alloc_inode is called
+    let erofs_inode = unsafe { &*container_of!(k_inode.as_ptr(), KernelInode, k_inode) };

	Ummm...  A wrapper would be highly useful.  And the reason why
it's safe is different - your function is called only via ->i_op->lookup,
the is only one instance of inode_operations that has that ->lookup
method, and the only place where an inode gets ->i_op set to that
is erofs_fill_inode().  Which is always passed erofs_inode::vfs_inode.

So my original intention behind this is that all vfs_inodes come from
that erofs_iget function and it's always gets initialized in this case
And this just followes the same convention here. I can document this
more precisely.

I think Al just would like a wrapper here, like the current C EROFS_I().

+    // SAFETY: The super_block is initialized when the erofs_alloc_sbi_rust is called.
+    let sbi = erofs_sbi(unsafe { NonNull::new(k_inode.as_ref().i_sb).unwrap() });

	Again, that calls for a wrapper - this time not erofs-specific;
inode->i_sb is *always* non-NULL, is assign-once and always points
to live struct super_block instance at least until the call of
destroy_inode().

Will be modified correctly, I'm not a native speaker and I just can't
find a better way, I will take my note here.

Same here, like the current EROFS_I_SB().

+    // SAFETY: this is backed by qstr which is c representation of a valid slice.

	What is that sentence supposed to mean?  Nevermind "why is it correct"...

+    let name = unsafe {
+        core::str::from_utf8_unchecked(core::slice::from_raw_parts(
+            dentry.as_ref().d_name.name,
+            dentry.as_ref().d_name.__bindgen_anon_1.__bindgen_anon_1.len as usize,

...

	Current erofs_lookup() (and your version as well) *is* indeed
safe in that respect, but the proof (from filesystem POV) is that "it's
called only as ->lookup() instance, so dentry is initially unhashed
negative and will remain such until it's passed to d_splice_alias();
until that point it is guaranteed to have ->d_name and ->d_parent stable".

Agreed.

	Note that once you _have_ called d_splice_alias(), you can't
count upon the ->d_name stability - or, indeed, upon ->d_name.name you've
sampled still pointing to allocated memory.

	For directory-modifying methods it's "stable, since parent is held
exclusive".  Some internal function called from different environments?
Well...  Swear, look through the call graph and see what can be proven
for each.

Sorry for my ignorance.
I mean i just borrowed the code from the fs/erofs/namei.c and i directly
translated that into Rust code. That might be a problem that also
exists in original working C code.

As for EROFS (an immutable fs), I think after d_splice_alias(), d_name is
still stable (since we don't have rename semantics likewise for now).

But as the generic filesystem POV, d_name access is actually tricky under
RCU walk path indeed.

	Expressing that kind of fun in any kind of annotations (Rust type
system included) is not pleasant.  _Probably_ might be handled by a type
that would be a dentry pointer with annotation along the lines "->d_name
and ->d_parent of that one are stable".  Then e.g. ->lookup() would
take that thing as an argument and d_splice_alias() would consume it.
->mkdir() would get the same thing, etc.  I hadn't tried to get that
all way through (the amount of annotation churn in existing filesystems
would be high and hard to split into reviewable patch series), so there
might be dragons - and there definitely are places where the stability is
proven in different ways (e.g. if dentry->d_lock is held, we have the damn
thing stable; then there's a "take a safe snapshot of name" API; etc.).

That's kinda interesting, I originally thought that VFS will make sure
its d_name / d_parent is stable in the first place.
Again, I just don't have a full picture or understanding of VFS and my
code is just basic translation of original C code, Maybe we can address
this later.

d_alloc will allocate an unhashed dentry which is almost unrecognized
by VFS dcache (d_name is stable of course).

After d_splice_alias() and d_add(), rename() could change d_name.  So
either we take d_lock or with rcu_read_lock() to take a snapshot of
d_name in the RCU walk path.  That is my overall understanding.

But for EROFS, since we don't have rename, so it doesn't matter.

Thanks,
Gao Xiang