issues with ceph

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I can always remount and see them.   

But I wanted to preserve the "broken" state and see if I could figure out why it was happening.   (strace isn't particularly revealing.)

Some other things I noted was that 

- if I reboot the metadata server nobody seems to "fail over" to the hot spare (everything locks up until it's back online).   I'm guessing you have to manually make the spare primary, and then switch back?
- if I reboot the mon that someone is mounted to, his mount locks up (even if I list 4 monitors in the fstab), but other clients still work.







-----Original Message-----
From: Aronesty, Erik 
Sent: Friday, May 09, 2014 11:51 AM
To: 'Lincoln Bryant'
Cc: ceph-users
Subject: RE: issues with ceph

If I stat on that box, I get nothing:

q782657 at usadc-seaxd01:/mounts/ceph1/pubdata/tcga/raw$ cd BRCA
-bash: cd: BRCA: No such file or directory

perl -e 'print stat("BRCA")'
<no result>

If I access a mount on another machine, I can see the files:

q782657 at usadc-nasea05:/mounts/ceph1/pubdata/tcga$ ls -l raw
total 0
drwxrwxr-x 1 q783775 pipeline 366462246414 May  8 12:00 BRCA
drwxrwxr-x 1 q783775 pipeline 161578200377 May  8 12:00 COAD
drwxrwxr-x 1 q783775 pipeline 367320207221 May  8 11:35 HNSC
drwxrwxr-x 1 q783775 pipeline 333587505256 May  8 13:27 LAML
drwxrwxr-x 1 q783775 pipeline 380346443564 May  8 13:27 LUSC
drwxrwxr-x 1 q783775 pipeline 357340261602 May  8 13:33 PAAD
drwxrwxr-x 1 q783775 pipeline 389882082560 May  8 13:33 PRAD
drwxrwxr-x 1 q783775 pipeline 634089122305 May  8 13:33 STAD
drwxrwxr-x 1 q783775 pipeline 430754940032 May  8 13:33 THCA

I will try updating the kernel, and rerunning some tests.   Thanks.	 


-----Original Message-----
From: Lincoln Bryant [mailto:lincolnb@xxxxxxxxxxxx] 
Sent: Friday, May 09, 2014 10:39 AM
To: Aronesty, Erik
Cc: ceph-users
Subject: Re: issues with ceph

Hi Erik,

What happens if you try to stat one of the "missing" files (assuming you know the name of the file before you remount raw)?

I had a problem where files would disappear and reappear in CephFS, which I believe was fixed in kernel 3.12.

Cheers,
Lincoln

On May 9, 2014, at 9:30 AM, Aronesty, Erik wrote:

> So we were attempting to stress test a cephfs installation, and last night, after copying 500GB of files, we got this:
> 
> 570G in the "raw" directory
> 
> q782657 at usadc-seaxd01:/mounts/ceph1/pubdata/tcga$ ls -lh
> total 32M
> -rw-rw-r-- 1 q783775 pipeline  32M May  8 10:39 2014-02-25T12:00:01-0800_data_manifest.tsv
> -rw-rw-r-- 1 q783775 pipeline  144 May  8 10:42 cghub.key
> drwxrwxr-x 1 q783775 pipeline 234G May  8 11:31 fastqs
> drwxrwxr-x 1 q783775 pipeline 570G May  8 13:33 raw
> -rw-rw-r-- 1 q783775 pipeline   86 May  8 11:19 readme.txt
> 
> But when I ls into the "raw" folder, I get zero files:
> 
> q782657 at usadc-seaxd01:/mounts/ceph1/pubdata/tcga$ ls -lh raw
> total 0
> 
> If I mount that folder again... all the files "re-appear".
> 
> Is this a bug that's been solved in a newer release?
> 
> KERNEL:
> Linux usadc-nasea05 3.11.0-20-generic #34~precise1-Ubuntu SMP Thu Apr 3 17:25:07 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
> 
> CEPH:
> ii  ceph                              0.72.2-1precise                   distributed storage and file system
> 
> 
> ------ No errors that I could see on the client machine:
> 
> q782657 at usadc-seaxd01:/mounts/ceph1/pubdata/tcga$ dmesg | grep ceph
> [588560.047193] Key type ceph registered
> [588560.047334] libceph: loaded (mon/osd proto 15/24)
> [588560.102874] ceph: loaded (mds proto 32)
> [588560.117392] libceph: client6005 fsid f067539c-7426-47ee-afb0-7d2c6dfcbcd0
> [588560.126477] libceph: mon1 10.18.176.180:6789 session established
> 
> 
> ------ Ceph itself looks fine.
> 
> root at usadc-nasea05:~# ceph health
> HEALTH_OK
> 
> root at usadc-nasea05:~# ceph quorum_status
> {"election_epoch":668,"quorum":[0,1,2,3],"quorum_names":["usadc-nasea05","usadc-nasea06","usadc-nasea07","usadc-nasea08"],"quorum_leader_name":"usadc-nasea05","monmap":{"epoch":1,"fsid":"f067539c-7426-47ee-afb0-7d2c6dfcbcd0","modified":"0.000000","created":"0.000000","mons":[{"rank":0,"name":"usadc-nasea05","addr":"10.18.176.179:6789\/0"},{"rank":1,"name":"usadc-nasea06","addr":"10.18.176.180:6789\/0"},{"rank":2,"name":"usadc-nasea07","addr":"10.18.176.181:6789\/0"},{"rank":3,"name":"usadc-nasea08","addr":"10.18.176.182:6789\/0"}]}}
> 
> root at usadc-nasea05:~# ceph mon dump
> dumped monmap epoch 1
> epoch 1
> fsid f067539c-7426-47ee-afb0-7d2c6dfcbcd0
> last_changed 0.000000
> created 0.000000
> 0: 10.18.176.179:6789/0 mon.usadc-nasea05
> 1: 10.18.176.180:6789/0 mon.usadc-nasea06
> 2: 10.18.176.181:6789/0 mon.usadc-nasea07
> 3: 10.18.176.182:6789/0 mon.usadc-nasea08
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux