Re: errors after kernel-upgrade -- Help needed

Markus Goldberg <goldberg@xxxxxxxxxxxxxxxxx> · Mon, 16 Sep 2013 15:35:09 +0200



    Hi,

      i must ask once again. Is there really no help to this problem?

      The errors are still remaining. I can't mount the cluster anymore.
      All my data is gone.

      The error-messages are still changing every few seconds.

      
      What can i do ?

      
      Please help,

        Markus

      
      Am 11.09.2013 08:39, schrieb Markus Goldberg:

    
      Does noone have an idea ?

        I can't mount the cluster anymore.

        
        Thank you,

          Markus

        
        Am 10.09.2013 09:43, schrieb Markus Goldberg:

      
        Hi,

          i made a 'stop ceph-all' on my ceph-admin-host and then a
          kernel-upgrade from 3.9 to 3.11 on all of my 3 nodes.

          Ubuntu 13.04, ceph 0,68

          The kernel-upgrade required a reboot.

          Now after rebooting i get the following errors:

          
          root@bd-a:~# ceph -s

                cluster
              e0dbf70d-af59-42a5-b834-7ad739a7f89b

                 health HEALTH_WARN 133 pgs peering; 272
              pgs stale; 265 pgs stuck unclean; 2 requests are blocked
              > 32 sec; mds cluster is degraded

                 monmap e1: 3 mons at
              {bd-0=xxx.xxx.xxx.20:6789/0,bd-1=xxx.xxx.xxx.21:6789/0,bd-2=xxx.xxx.xxx.22:6789/0},


              election epoch 782, quorum 0,1,2 bd-0,bd-1,bd-2

                 mdsmap e451467: 1/1/1 up
              {0=bd-0=up:replay}, 2 up:standby

                 osdmap e464358: 3 osds: 3 up, 3 in

                  pgmap v1343477: 792 pgs, 9 pools,
              15145 MB data, 4986 objects

                        30927 MB used, 61372 GB / 61408
              GB avail

                             387 active+clean

                             122 stale+active

                             140 stale+active+clean

                             133 peering

                              10 stale+active+replay

            
            root@bd-a:~# ceph -s

                cluster
              e0dbf70d-af59-42a5-b834-7ad739a7f89b

                 health HEALTH_WARN 6 pgs down; 377 pgs
              peering; 296 pgs stuck unclean; mds cluster is degraded

                 monmap e1: 3 mons at {bd-0=xxx.xxx.xxx.20:6789/0,bd-1=xxx.xxx.xxx.21:6789/0,bd-2=xxx.xxx.xxx.22:6789/0},


              election epoch 782, quorum 0,1,2 bd-0,bd-1,bd-2

                 mdsmap e451467: 1/1/1 up
              {0=bd-0=up:replay}, 2 up:standby

                 osdmap e464400: 3 osds: 3 up, 3 in

                  pgmap v1343586: 792 pgs, 9 pools,
              15145 MB data, 4986 objects

                        31046 MB used, 61372 GB / 61408
              GB avail

                             142 active

                             270 active+clean

                               3 active+replay

                             371 peering

                               6 down+peering

            
            root@bd-a:~# ceph -s

                cluster
              e0dbf70d-af59-42a5-b834-7ad739a7f89b

                 health HEALTH_WARN 257 pgs peering; 359
              pgs stuck unclean; 1 requests are blocked > 32 sec; mds
              cluster is degraded

                 monmap e1: 3 mons at {bd-0=xxx.xxx.xxx.20:6789/0,bd-1=xxx.xxx.xxx.21:6789/0,bd-2=xxx.xxx.xxx.22:6789/0},


              election epoch 782, quorum 0,1,2 bd-0,bd-1,bd-2

                 mdsmap e451467: 1/1/1 up
              {0=bd-0=up:replay}, 2 up:standby

                 osdmap e464403: 3 osds: 3 up, 3 in

                  pgmap v1343594: 792 pgs, 9 pools,
              15145 MB data, 4986 objects

                        31103 MB used, 61372 GB / 61408
              GB avail

                             373 active

                             157 active+clean

                               5 active+replay

                             257 peering

            
            root@bd-a:~#

          
          As you can see above, the errors are changing, perhaps any
          selfrepair is on the run in the background. But this is since
          12 hours.

          What should i do ?

          
          Thank you,

            Markus

          Am 09.09.2013 13:52, schrieb Yan, Zheng:

        
         The bug has been fixed in 3.11 kernel by commit
          ccca4e37b1 (libceph: fix truncate size calculation). We don't
          backport cephfs bug fixes to old kernel. please update the
          kernel or use ceph-fuse. Regards Yan, Zheng
          
            Best regards,
Tobi

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


          _______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

        
        -- 
MfG,
  Markus Goldberg

------------------------------------------------------------------------
Markus Goldberg     | Universität Hildesheim
                    | Rechenzentrum
Tel +49 5121 883212 | Marienburger Platz 22, D-31141 Hildesheim, Germany
Fax +49 5121 883205 | email goldberg@xxxxxxxxxxxxxxxxx
------------------------------------------------------------------------

        
        _______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

      
      -- 
MfG,
  Markus Goldberg

------------------------------------------------------------------------
Markus Goldberg     | Universität Hildesheim
                    | Rechenzentrum
Tel +49 5121 883212 | Marienburger Platz 22, D-31141 Hildesheim, Germany
Fax +49 5121 883205 | email goldberg@xxxxxxxxxxxxxxxxx
------------------------------------------------------------------------

      
      _______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

    
    -- 
MfG,
  Markus Goldberg

------------------------------------------------------------------------
Markus Goldberg     | Universität Hildesheim
                    | Rechenzentrum
Tel +49 5121 883212 | Marienburger Platz 22, D-31141 Hildesheim, Germany
Fax +49 5121 883205 | email goldberg@xxxxxxxxxxxxxxxxx
------------------------------------------------------------------------

  
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com