Re: ceph kernel client RIP when quota exceeded

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Andrej,

The upstream kernel has one commit:

commit 0078ea3b0566e3da09ae8e1e4fbfd708702f2876
Author: Jeff Layton <jlayton@xxxxxxxxxx>
Date:   Tue Nov 9 09:54:49 2021 -0500

    ceph: don't check for quotas on MDS stray dirs

    玮文 胡 reported seeing the WARN_RATELIMIT pop when writing to an
    inode that had been transplanted into the stray dir. The client was
    trying to look up the quotarealm info from the parent and that tripped
    the warning.

    Change the ceph_vino_is_reserved helper to not throw a warning for
    MDS stray directories (0x100 - 0x1ff), only for reserved dirs that
    are not in that range.

    Also, fix ceph_has_realms_with_quotas to return false when encountering
    a reserved inode.

    URL: https://tracker.ceph.com/issues/53180
    Reported-by: Hu Weiwen <sehuww@xxxxxxxxxxxxxxxx>
    Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
    Reviewed-by: Luis Henriques <lhenriques@xxxxxxx>
    Reviewed-by: Xiubo Li <xiubli@xxxxxxxxxx>
    Signed-off-by: Ilya Dryomov <idryomov@xxxxxxxxx>

It's not a bug, just a warning, you can safely ignore it.

Thanks.

On 8/16/22 7:39 PM, Andrej Filipcic wrote:

Hi,

we experienced massive node failures when a user with cephfs quota exceeded submitted many jobs to a slurm cluster, home is on cephfs. The nodes still work for some time, but they eventually freeze due to too many stuck CPUs

Is this a kernel ceph client bug? running on 5.10.123, ceph cluster is 16.2.9.

Best regards,
Andrej

2022-08-15T20:08:01+02:00 cn0539 kernel: ------------[ cut here ]------------ 2022-08-15T20:08:01+02:00 cn0539 kernel: Attempt to access reserved inode number 0x101 2022-08-15T20:08:01+02:00 cn0539 kernel: WARNING: CPU: 172 PID: 4185848 at fs/ceph/super.h:547 __lookup_inode+0x161/0x180 [ceph] 2022-08-15T20:08:14+02:00 cn0539 kernel: Modules linked in: squashfs loop overlay fuse ceph libceph mgc(O) lustre(O) lmv(O) mdc(O) fid(O) lov(O) fld(O) osc(O) ko2iblnd(O) ptlrpc(O) obdclass(O) lnet(O) libcfs(O) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace nfs_ssc fscache rfkill ipmi_ssif nft_limit amd64_edac_mod edac_mce_amd amd_energy nft_ct kvm_amd nf_conntrack nf_defrag_ipv6 kvm nf_defrag_ipv4 irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl pcspkr nf_tables libcrc32c nfnetlink sp5100_tco ccp acpi_ipmi k10temp i2c_piix4 ipmi_si rdma_ucm(O) rdma_cm(O) iw_cm(O) acpi_cpufreq ib_ipoib(O) ib_cm(O) ib_umad(O) sunrpc vfat fat ext4 mbcache jbd2 mlx5_ib(O) ib_uverbs(O) ib_core(O) mlx5_core(O) mlxfw(O) pci_hyperv_intf crc32c_inte l tls ahci nvme psample igb libahci mlxdevm(O) auxiliary(O) nvme_core i2c_algo_bit libata t10_pi dca mlx_compat(O) pinctrl_amd xpmem(O) ipmi_devintf ipmi_msghandler 2022-08-15T20:08:14+02:00 cn0539 kernel: CPU: 172 PID: 4185848 Comm: slurm_script Tainted: G        W  O      5.10.123-2.el8.x86_64 #1 2022-08-15T20:08:16+02:00 cn0539 kernel: Hardware name: To be filled by O.E.M. To be filled by O.E.M./CER, BIOS BIOS_RME090.22.37.001 10/05/2021 2022-08-15T20:08:17+02:00 cn0539 kernel: RIP: 0010:__lookup_inode+0x161/0x180 [ceph] 2022-08-15T20:08:18+02:00 cn0539 kernel: Code: dd 48 85 db 0f 85 27 ff ff ff 45 85 e4 0f 89 5d ff ff ff 49 63 ec e9 16 ff ff ff 48 89 de 48 c7 c7 58 bb 40 c1 e8 1e 21 d8 d0 <0f> 0b e9 3f ff ff ff e8 53 3d 01 00 eb c6 be 03 00 00 00 e8 97 a2 2022-08-15T20:08:21+02:00 cn0539 kernel: RSP: 0018:ffffb6d8de33fc18 EFLAGS: 00010286 2022-08-15T20:08:22+02:00 cn0539 kernel: RAX: 0000000000000000 RBX: 0000000000000101 RCX: 0000000000000027 2022-08-15T20:08:23+02:00 cn0539 kernel: RDX: 0000000000000027 RSI: ffff95f2afd207e0 RDI: ffff95f2afd207e8 2022-08-15T20:08:24+02:00 cn0539 kernel: RBP: ffff965345e568a0 R08: 0000000000000000 R09: c0000000fffeffff 2022-08-15T20:08:25+02:00 cn0539 kernel: R10: 0000000000000001 R11: ffffb6d8de33fa20 R12: ffff959e55081aa8 2022-08-15T20:08:27+02:00 cn0539 kernel: R13: ffff965345e568a8 R14: ffff9593ea333e00 R15: ffff959e55081a80 2022-08-15T20:08:28+02:00 cn0539 kernel: FS:  00007fbf7c8ba740(0000) GS:ffff95f2afd00000(0000) knlGS:0000000000000000 2022-08-15T20:08:29+02:00 cn0539 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 2022-08-15T20:08:30+02:00 cn0539 kernel: CR2: 0000564324b8a588 CR3: 0000004d51150000 CR4: 0000000000150ee0
2022-08-15T20:08:31+02:00 cn0539 kernel: Call Trace:
2022-08-15T20:08:31+02:00 cn0539 kernel: ? __do_request+0x3f0/0x450 [ceph] 2022-08-15T20:08:32+02:00 cn0539 kernel: ceph_lookup_inode+0xa/0x30 [ceph] 2022-08-15T20:08:34+02:00 cn0539 kernel: lookup_quotarealm_inode.isra.9+0x188/0x210 [ceph] 2022-08-15T20:08:34+02:00 cn0539 kernel: check_quota_exceeded+0x1bc/0x220 [ceph] 2022-08-15T20:08:34+02:00 cn0539 kernel: ceph_write_iter+0x1bf/0xc90 [ceph]
2022-08-15T20:08:35+02:00 cn0539 kernel: ? path_openat+0x666/0x1050
2022-08-15T20:08:36+02:00 cn0539 kernel: ? __touch_cap+0x1f/0xd0 [ceph]
2022-08-15T20:08:36+02:00 cn0539 kernel: ? ptep_set_access_flags+0x23/0x30
2022-08-15T20:08:37+02:00 cn0539 kernel: ? wp_page_reuse+0x5f/0x70
2022-08-15T20:08:38+02:00 cn0539 kernel: ? new_sync_write+0x11f/0x1b0
2022-08-15T20:08:38+02:00 cn0539 kernel: new_sync_write+0x11f/0x1b0
2022-08-15T20:08:39+02:00 cn0539 kernel: vfs_write+0x1bd/0x270
2022-08-15T20:08:40+02:00 cn0539 kernel: ksys_write+0x59/0xd0
2022-08-15T20:08:40+02:00 cn0539 kernel: do_syscall_64+0x33/0x40
2022-08-15T20:08:41+02:00 cn0539 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
2022-08-15T20:08:41+02:00 cn0539 kernel: RIP: 0033:0x7fbf7bfc65a8
2022-08-15T20:08:42+02:00 cn0539 kernel: Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 f5 3f 2a 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55 2022-08-15T20:08:45+02:00 cn0539 kernel: RSP: 002b:00007ffcc4ad6dd8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 2022-08-15T20:08:46+02:00 cn0539 kernel: RAX: ffffffffffffffda RBX: 0000000000000417 RCX: 00007fbf7bfc65a8 2022-08-15T20:08:47+02:00 cn0539 kernel: RDX: 0000000000000417 RSI: 0000564324baa470 RDI: 0000000000000004 2022-08-15T20:08:48+02:00 cn0539 kernel: RBP: 0000564324baa470 R08: 0000000000000008 R09: 00224b5341545f52 2022-08-15T20:08:49+02:00 cn0539 kernel: R10: 0000000000000025 R11: 0000000000000246 R12: 0000564324b9cf50 2022-08-15T20:08:51+02:00 cn0539 kernel: R13: 0000000000000000 R14: 0000564324ba6200 R15: 0000564324b9cf50 2022-08-15T20:08:52+02:00 cn0539 kernel: ---[ end trace a655820d09b78154 ]--- 2022-08-15T20:09:58+02:00 cn0539 kernel: mlx5_core 0000:61:00.0: mlx5_cmd_out_err:800:(pid 4155261): MAD_IFC(0x50d) op_mod(0x0) failed, status bad packet (discarded)(0x30), syndrome (0xea9eb5), err(-22) 2022-08-15T20:09:58+02:00 cn0539 kernel: mlx5_core 0000:61:00.0: mlx5_cmd_out_err:800:(pid 4155261): MAD_IFC(0x50d) op_mod(0x0) failed, status bad packet (discarded)(0x30), syndrome (0xea9eb5), err(-22) 2022-08-15T20:10:12+02:00 cn0539 kernel: ------------[ cut here ]------------ 2022-08-15T20:10:12+02:00 cn0539 kernel: Attempt to access reserved inode number 0x101 2022-08-15T20:10:12+02:00 cn0539 kernel: WARNING: CPU: 78 PID: 14675 at fs/ceph/super.h:547 __lookup_inode+0x161/0x180 [ceph] 2022-08-15T20:10:26+02:00 cn0539 kernel: Modules linked in: squashfs loop overlay fuse ceph libceph mgc(O) lustre(O) lmv(O) mdc(O) fid(O) lov(O) fld(O) osc(O) ko2iblnd(O) ptlrpc(O) obdclass(O) lnet(O) libcfs(O) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace nfs_ssc fscache rfkill ipmi_ssif nft_limit amd64_edac_mod edac_mce_amd amd_energy nft_ct kvm_amd nf_conntrack nf_defrag_ipv6 kvm nf_defrag_ipv4 irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl pcspkr nf_tables libcrc32c nfnetlink sp5100_tco ccp acpi_ipmi k10temp i2c_piix4 ipmi_si rdma_ucm(O) rdma_cm(O) iw_cm(O) acpi_cpufreq ib_ipoib(O) ib_cm(O) ib_umad(O) sunrpc vfat fat ext4 mbcache jbd2 mlx5_ib(O) ib_uverbs(O) ib_core(O) mlx5_core(O) mlxfw(O) pci_hyperv_intf crc32c_inte l tls ahci nvme psample igb libahci mlxdevm(O) auxiliary(O) nvme_core i2c_algo_bit libata t10_pi dca mlx_compat(O) pinctrl_amd xpmem(O) ipmi_devintf ipmi_msghandler 2022-08-15T20:10:26+02:00 cn0539 kernel: CPU: 78 PID: 14675 Comm: slurm_script Tainted: G        W  O      5.10.123-2.el8.x86_64 #1 2022-08-15T20:10:27+02:00 cn0539 kernel: Hardware name: To be filled by O.E.M. To be filled by O.E.M./CER, BIOS BIOS_RME090.22.37.001 10/05/2021 2022-08-15T20:10:29+02:00 cn0539 kernel: RIP: 0010:__lookup_inode+0x161/0x180 [ceph] 2022-08-15T20:10:30+02:00 cn0539 kernel: Code: dd 48 85 db 0f 85 27 ff ff ff 45 85 e4 0f 89 5d ff ff ff 49 63 ec e9 16 ff ff ff 48 89 de 48 c7 c7 58 bb 40 c1 e8 1e 21 d8 d0 <0f> 0b e9 3f ff ff ff e8 53 3d 01 00 eb c6 be 03 00 00 00 e8 97 a2 2022-08-15T20:10:33+02:00 cn0539 kernel: RSP: 0018:ffffb6d8d2ab7c18 EFLAGS: 00010286 2022-08-15T20:10:33+02:00 cn0539 kernel: RAX: 0000000000000000 RBX: 0000000000000101 RCX: 0000000000000027 2022-08-15T20:10:35+02:00 cn0539 kernel: RDX: 0000000000000027 RSI: ffff9632af9a07e0 RDI: ffff9632af9a07e8 2022-08-15T20:10:36+02:00 cn0539 kernel: RBP: ffff965345e568a0 R08: 0000000000000000 R09: c0000000fffeffff 2022-08-15T20:10:37+02:00 cn0539 kernel: R10: 0000000000000001 R11: ffffb6d8d2ab7a20 R12: ffff959e55081aa8 2022-08-15T20:10:38+02:00 cn0539 kernel: R13: ffff965345e568a8 R14: ffff9593f4994600 R15: ffff959e55081a80 2022-08-15T20:10:39+02:00 cn0539 kernel: FS:  00007f660e249740(0000) GS:ffff9632af980000(0000) knlGS:0000000000000000 2022-08-15T20:10:40+02:00 cn0539 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 2022-08-15T20:10:41+02:00 cn0539 kernel: CR2: 000055d6b3db5588 CR3: 0000008a75ce8000 CR4: 0000000000150ee0
2022-08-15T20:10:42+02:00 cn0539 kernel: Call Trace:
2022-08-15T20:10:43+02:00 cn0539 kernel: ? __do_request+0x3f0/0x450 [ceph] 2022-08-15T20:10:43+02:00 cn0539 kernel: ceph_lookup_inode+0xa/0x30 [ceph] 2022-08-15T20:10:44+02:00 cn0539 kernel: lookup_quotarealm_inode.isra.9+0x188/0x210 [ceph] 2022-08-15T20:10:45+02:00 cn0539 kernel: check_quota_exceeded+0x1bc/0x220 [ceph] 2022-08-15T20:10:46+02:00 cn0539 kernel: ceph_write_iter+0x1bf/0xc90 [ceph]
2022-08-15T20:10:47+02:00 cn0539 kernel: ? path_openat+0x666/0x1050
2022-08-15T20:10:47+02:00 cn0539 kernel: ? __do_request+0x3f0/0x450 [ceph] 2022-08-15T20:10:48+02:00 cn0539 kernel: ? __ceph_put_cap_refs+0x30/0x380 [ceph] 2022-08-15T20:10:49+02:00 cn0539 kernel: ? ptep_set_access_flags+0x23/0x30
2022-08-15T20:10:49+02:00 cn0539 kernel: ? wp_page_reuse+0x5f/0x70
2022-08-15T20:10:50+02:00 cn0539 kernel: ? new_sync_write+0x11f/0x1b0
2022-08-15T20:10:51+02:00 cn0539 kernel: new_sync_write+0x11f/0x1b0
2022-08-15T20:10:51+02:00 cn0539 kernel: vfs_write+0x1bd/0x270
2022-08-15T20:10:52+02:00 cn0539 kernel: ksys_write+0x59/0xd0
2022-08-15T20:10:52+02:00 cn0539 kernel: do_syscall_64+0x33/0x40
2022-08-15T20:10:53+02:00 cn0539 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
2022-08-15T20:10:54+02:00 cn0539 kernel: RIP: 0033:0x7f660d9555a8
2022-08-15T20:10:54+02:00 cn0539 kernel: Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 f5 3f 2a 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55 2022-08-15T20:10:57+02:00 cn0539 kernel: RSP: 002b:00007ffe2286c368 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 2022-08-15T20:10:58+02:00 cn0539 kernel: RAX: ffffffffffffffda RBX: 0000000000000417 RCX: 00007f660d9555a8 2022-08-15T20:10:59+02:00 cn0539 kernel: RDX: 0000000000000417 RSI: 000055d6b3dd5470 RDI: 0000000000000004 2022-08-15T20:11:01+02:00 cn0539 kernel: RBP: 000055d6b3dd5470 R08: 0000000000000008 R09: 00224b5341545f52 2022-08-15T20:11:02+02:00 cn0539 kernel: R10: 0000000000000025 R11: 0000000000000246 R12: 000055d6b3dc7f50



_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux