Re: Cephfs error state with one bad file

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Sake,

On Tue, Jan 2, 2024 at 4:02 AM Sake Ceph <ceph@xxxxxxxxxxx> wrote:
>
> Hi again, hopefully for the last time with problems.
>
> We had a MDS crash earlier with the MDS staying in failed state and used a command to reset the filesystem (this was wrong, I know now, thanks Patrick Donnelly for pointing this out). I did a full scrub on the filesystem and two files were damaged. One of those got repaired, but the following file keeps giving errors and can't be removed.
> What can I do now? Below some information.
>
> # ceph tell mds.atlassian-prod:0 damage ls
> [
>     {
>         "damage_type": "backtrace",
>         "id": 2244444901,
>         "ino": 1099534008829,
>         "path": "/app1/shared/data/repositories/11271/objects/41/8f82507a0737c611720ed224bcc8b7a24fda01"
>     }
> ]
>
>
> Trying to repair the error (online research shows this should work for a backtrace damage type)
> ----------
> # ceph tell mds.atlassian-prod:0 scrub start /app1/shared/data/repositories/11271 recursive,repair,force
> {
>     "return_code": 0,
>     "scrub_tag": "d10ead42-5280-4224-971e-4f3022e79278",
>     "mode": "asynchronous"
> }
>
>
> Cluster logs after this
> ----------
> 1/2/24 9:37:05 AM
> [INF]
> scrub summary: idle
>
> 1/2/24 9:37:02 AM
> [INF]
> scrub summary: idle+waiting paths [/app1/shared/data/repositories/11271]
>
> 1/2/24 9:37:01 AM
> [INF]
> scrub summary: active paths [/app1/shared/data/repositories/11271]
>
> 1/2/24 9:37:01 AM
> [INF]
> scrub summary: idle+waiting paths [/app1/shared/data/repositories/11271]
>
> 1/2/24 9:37:01 AM
> [INF]
> scrub queued for path: /app1/shared/data/repositories/11271
>
>
> But the error doesn't disappear and still can't remove the file.
>
>
> On the client trying to remove the file (we got a backup)
> ----------
> $ rm -f /mnt/shared_disk-app1/shared/data/repositories/11271/objects/41/8f82507a0737c611720ed224bcc8b7a24fda01
> rm: cannot remove '/mnt/shared_disk-app1/shared/data/repositories/11271/objects/41/8f82507a0737c611720ed224bcc8b7a24fda01': Input/output error

Did you try `damage rm <id>` after scrubbing?


-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux