On 22/03/2023 23:41, Gregory Farnum wrote:
On Wed, Mar 22, 2023 at 8:27 AM Frank Schilder <frans@xxxxxx> wrote:
Hi Gregory,
thanks for your reply. First a quick update. Here is how I get ln to work
after it failed, there seems no timeout:
$ ln envs/satwindspy/include/ffi.h
mambaforge/pkgs/libffi-3.3-h58526e2_2/include/ffi.h
ln: failed to create hard link
'mambaforge/pkgs/libffi-3.3-h58526e2_2/include/ffi.h': Read-only file system
$ ls -l envs/satwindspy/include mambaforge/pkgs/libffi-3.3-h58526e2_2
envs/satwindspy/include:
total 7664
-rw-rw-r--. 1 rit rit 959 Mar 5 2021 ares_build.h
[...]
$ ln envs/satwindspy/include/ffi.h
mambaforge/pkgs/libffi-3.3-h58526e2_2/include/ffi.h
After an ls -l on both directories ln works.
To the question: How can I pull out a log from the nfs server? There is
nothing in /var/log/messages.
So you’re using the kernel server and re-exporting, right?
I’m not very familiar with its implementation; I wonder if it’s doing
something strange via the kernel vfs.
AFAIK this isn’t really supportable for general use because nfs won’t
respect the CephFS file consistency protocol. But maybe it’s trying a bit
and that’s causing trouble?
Yeah, I think you are right Greg.
Checked the logs uploaded by Frank. I found that the kclient just send
one request like:
++++++
2023-03-27T23:24:37.866+0200 7f0c1a0d1700 7 mds.0.server
dispatch_client_request client_request(client.186555:475421 link
#0x10000682337/liblz4.so.1.9.3 #0x1000066d6d8//
2023-03-27T23:24:37.864907+0200 caller_uid=1000,
caller_gid=1000{4,24,27,30,46,122,134,135,1000,}) v4
2023-03-27T23:24:37.866+0200 7f0c1a0d1700 7 mds.0.server
handle_client_link #0x10000682337/liblz4.so.1.9.3 to #0x1000066d6d8//
2023-03-27T23:24:37.866+0200 7f0c1a0d1700 10 mds.0.server
rdlock_two_paths_xlock_destdn request(client.186555:475421 nref=2
cr=0x5601bbc60500) #0x10000682337/liblz4.so.1.9.3 #0x1000066d6d8//
2023-03-27T23:24:37.866+0200 7f0c1a0d1700 7 mds.0.server
reply_client_request -30 ((30) Read-only file system)
client_request(client.186555:475421 link #0x10000682337/liblz4.so.1.9.3
#0x1000066d6d8// 2023-03-27T23:24:37.864907+0200 caller_uid=1000,
caller_gid=1000{4,24,27,30,46,122,134,135,1000,}) v4
------
The kclient just set the src dentry to "#0x1000066d6d8//". While the mds
will parse the "//" as a snapdir, which is readonly. This is why mds
return a -EROFS error.
But from mds logs we can see that the "0x1000066d6d8" is
"/data/nfs/envs/satwindspy/lib/liblz4.so.1.9.3":
++++++
2023-03-27T23:24:37.866+0200 7f0c1a0d1700 7 mds.0.locker issue_caps
allowed=pAsLsXsFscrl, xlocker allowed=pAsLsXsFscrl on [inode
0x1000066d6d8 [...7b,head] /data/nfs/envs/satwindspy/lib/liblz4.so.1.9.3
auth v7035 snaprealm=0x55fe3785e500 s=215880 nl=2 n(v0
rc2023-03-27T23:15:22.568391+0200 b215880 1=1+0) (iversion lock)
caps={186555=pAsXsFscr/-@3} | ptrwaiter=0 request=0 lock=0 caps=1
remoteparent=1 dirtyparent=0 dirty=0 authpin=0 0x5601b7174800]
------
Then from the kernel debug logs:
++++++
31358125 [16380611.812642] ceph: do_request mds0 session
00000000a66983cb state open
31358126 [16380611.812644] ceph: __prepare_send_request
000000001ebc34fd tid 475421 link (attempt 1)
31358127 [16380611.812647] ceph: dentry 000000006cbb0f2e
10000682337/liblz4.so.1.9.3
31358128 [16380611.812649] ceph: dentry 00000000126d4660 1000066d6d8//
------
We can see that the kclient set the src dentry to "1000066d6d8//".
This is incorrect and it should be "1000066d2e3/liblz4.so.1.9.3", which
the "1000066d2e3" is the parent dir's inode and the path is
"/data/nfs/envs/satwindspy/lib/".
From the fs/ceph/dir.c code, we can see that the ceph_link() will parse
the src dentry:
2735 static int build_dentry_path(struct dentry *dentry, struct inode *dir,
2736 const char **ppath, int *ppathlen, u64
*pino,
2737 bool *pfreepath, bool parent_locked)
2738 {
2739 char *path;
2740
2741 rcu_read_lock();
2742 if (!dir)
2743 dir = d_inode_rcu(dentry->d_parent);
2744 if (dir && parent_locked && ceph_snap(dir) == CEPH_NOSNAP
&& !IS_ENCRYPTED(dir)) {
2745 *pino = ceph_ino(dir);
2746 rcu_read_unlock();
2747 *ppath = dentry->d_name.name;
2748 *ppathlen = dentry->d_name.len;
2749 return 0;
2750 }
2751 rcu_read_unlock();
2752 path = ceph_mdsc_build_path(dentry, ppathlen, pino, 1);
2753 if (IS_ERR(path))
2754 return PTR_ERR(path);
2755 *ppath = path;
2756 *pfreepath = true;
2757 return 0;
2758 }
In Line#2743, the 'dir' was parsed as "liblz4.so.1.9.3" 's ino#
"1000066d6d8", which is incorrect and it should be the parent dir's ino#
"1000066d2e3". And in Line#2747 the "ppath" is "/", which is also
incorrect and it should be "liblz4.so.1.9.3".
That means the nfs client passed a invalidate or corrupted old_dentry to
kernel ceph. I have no idea how that could happen.
@Frank,
Could you check the nfs client logs ?
Thanks,
- Xiubo
-Greg
I can't reproduce it with simple commands on the NFS client. It seems to
occur only when a large number of files/dirs is created. I can make the
archive available to you if this helps.
Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
________________________________________
From: Gregory Farnum <gfarnum@xxxxxxxxxx>
Sent: Wednesday, March 22, 2023 4:14 PM
To: Frank Schilder
Cc: ceph-users@xxxxxxx
Subject: Re: Re: ln: failed to create hard link 'file name':
Read-only file system
Do you have logs of what the nfs server is doing?
Managed to reproduce it in terms of direct CephFS ops?
On Wed, Mar 22, 2023 at 8:05 AM Frank Schilder <frans@xxxxxx<mailto:
frans@xxxxxx>> wrote:
I have to correct myself. It also fails on an export with "sync" mode.
Here is an strace on the client (strace ln envs/satwindspy/include/ffi.h
mambaforge/pkgs/libffi-3.3-h58526e2_2/include/ffi.h):
[...]
stat("mambaforge/pkgs/libffi-3.3-h58526e2_2/include/ffi.h",
0x7ffdc5c32820) = -1 ENOENT (No such file or directory)
lstat("envs/satwindspy/include/ffi.h", {st_mode=S_IFREG|0664,
st_size=13934, ...}) = 0
linkat(AT_FDCWD, "envs/satwindspy/include/ffi.h", AT_FDCWD,
"mambaforge/pkgs/libffi-3.3-h58526e2_2/include/ffi.h", 0) = -1 EROFS
(Read-only file system)
[...]
write(2, "ln: ", 4ln: ) = 4
write(2, "failed to create hard link 'mamb"..., 80failed to create hard
link 'mambaforge/pkgs/libffi-3.3-h58526e2_2/include/ffi.h') = 80
[...]
write(2, ": Read-only file system", 23: Read-only file system) = 23
write(2, "\n", 1
) = 1
lseek(0, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
close(0) = 0
close(1) = 0
close(2) = 0
exit_group(1) = ?
+++ exited with 1 +++
Has anyone advice?
Thanks!
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
________________________________________
From: Frank Schilder <frans@xxxxxx<mailto:frans@xxxxxx>>
Sent: Wednesday, March 22, 2023 2:44 PM
To: ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
Subject: ln: failed to create hard link 'file name':
Read-only file system
Hi all,
on an NFS re-export of a ceph-fs (kernel client) I observe a very strange
error. I'm un-taring a larger package (1.2G) and after some time I get
these errors:
ln: failed to create hard link 'file name': Read-only file system
The strange thing is that this seems only temporary. When I used "ln src
dst" for manual testing, the command failed as above. However, after that I
tried "ln -v src dst" and this command created the hard link with exactly
the same path arguments. During the period when the error occurs, I can't
see any FS in read-only mode, neither on the NFS client nor the NFS server.
Funny thing is that file creation and write still works, its only the
hard-link creation that fails.
For details, the set-up is:
file-server: mount ceph-fs at /shares/path, export /shares/path as nfs4 to
other server
other server: mount /shares/path as NFS
More precisely, on the file-server:
fstab: MON-IPs:/shares/folder /shares/nfs/folder ceph
defaults,noshare,name=NAME,secretfile=sec.file,mds_namespace=FS-NAME,_netdev
0 0
exports: /shares/nfs/folder
-no_root_squash,rw,async,mountpoint,no_subtree_check DEST-IP
On the host at DEST-IP:
fstab: FILE-SERVER-IP:/shares/nfs/folder /mnt/folder nfs defaults,_netdev
0 0
Both, the file server and the client server are virtual machines. The file
server is on Centos 8 stream (4.18.0-338.el8.x86_64) and the client machine
is on AlmaLinux 8 (4.18.0-425.13.1.el8_7.x86_64).
When I change the NFS export from "async" to "sync" everything works.
However, that's a rather bad workaround and not a solution. Although this
looks like an NFS issue, I'm afraid it is a problem with hard links and
ceph-fs. It looks like a race with scheduling and executing operations on
the ceph-fs kernel mount.
Has anyone seen something like that?
Thanks and best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:
ceph-users-leave@xxxxxxx>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:
ceph-users-leave@xxxxxxx>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
--
Best Regards,
Xiubo Li (李秀波)
Email: xiubli@xxxxxxxxxx/xiubli@xxxxxxx
Slack: @Xiubo Li
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx