Re: Inpu/output error mounting

Daniel Davidson <danield@xxxxxxxxxxxxxxxx> · Fri, 23 Jun 2017 14:10:59 -0500



    Thanks for the response:

      
      [root@ceph-control ~]# ceph health detail | grep 'ops are blocked'

      100 ops are blocked > 134218 sec on osd.13

      [root@ceph-control ~]# ceph osd blocked-by

      osd num_blocked 

      
      A problem with osd.13?

      
      Dan

      
      On 06/23/2017 02:03 PM, David Turner wrote:

    
      # ceph health detail | grep 'ops are blocked'
        # ceph osd blocked-by

          
        My guess is that you have an OSD that is in a funky state
          blocking the requests and the peering.  Let me know what the
          output of those commands are.
        

            Also what are the replica sizes of your 2 pools?  It
              shows that only 1 OSD was last active for the 2 inactive
              PGs.  Not sure yet if that is anything of concern, but
              didn't want to ignore it.
          
        
        On Fri, Jun 23, 2017 at 1:16 PM Daniel Davidson
          <danield@xxxxxxxxxxxxxxxx>
          wrote:

        
        Two of our
          OSD systems hit 75% disk utilization, so I added another

          system to try and bring that back down.  The system was usable
          for a day

          while the data was being migrated, but now the system is not
          responding

          when I try to mount it:

          
            mount -t ceph ceph-0,ceph-1,ceph-2,ceph-3:6789:/ /home -o

          name=admin,secretfile=/etc/ceph/admin.secret

          mount error 5 = Input/output error

          
          Here is our ceph health

          
          [root@ceph-3 ~]# ceph -s

               cluster 7bffce86-9d7b-4bdf-a9c9-67670e68ca77

                health HEALTH_ERR

                       2 pgs are stuck inactive for more than 300
          seconds

                       58 pgs backfill_wait

                       20 pgs backfilling

                       3 pgs degraded

                       2 pgs stuck inactive

                       76 pgs stuck unclean

                       2 pgs undersized

                       100 requests are blocked > 32 sec

                       recovery 1197145/653713908 objects degraded
          (0.183%)

                       recovery 47420551/653713908 objects misplaced
          (7.254%)

                       mds0: Behind on trimming (180/30)

                       mds0: Client biologin-0 failing to respond to
          capability

          release

                       mds0: Many clients (20) failing to respond to
          cache pressure

                monmap e3: 4 mons at

          {ceph-0=MailScanner has detected a possible fraud attempt from "172.16.31.1:6789" claiming to be MailScanner
                warning: numerical links are often malicious:
172.16.31.1:6789/0,ceph-1=172.16.31.2:6789/0,ceph-2=172.16.31.3:6789/0,ceph-3=172.16.31.4:6789/0}

                       election epoch 542, quorum 0,1,2,3
          ceph-0,ceph-1,ceph-2,ceph-3

                 fsmap e17666: 1/1/1 up {0=ceph-0=up:active}, 3
          up:standby

                osdmap e25535: 32 osds: 32 up, 32 in; 78 remapped pgs

                       flags sortbitwise,require_jewel_osds

                 pgmap v19199544: 1536 pgs, 2 pools, 786 TB data, 299
          Mobjects

                       1595 TB used, 1024 TB / 2619 TB avail

                       1197145/653713908 objects degraded (0.183%)

                       47420551/653713908 objects misplaced (7.254%)

                           1448 active+clean

                             58 active+remapped+wait_backfill

                             17 active+remapped+backfilling

                             10 active+clean+scrubbing+deep

                              2
          undersized+degraded+remapped+backfilling+peered

                              1 active+degraded+remapped+backfilling

          recovery io 906 MB/s, 331 objects/s

          
          Checking in on the inactive PGs

          
          [root@ceph-control ~]# ceph health detail |grep inactive

          HEALTH_ERR 2 pgs are stuck inactive for more than 300 seconds;
          58 pgs

          backfill_wait; 20 pgs backfilling; 3 pgs degraded; 2 pgs stuck
          inactive;

          78 pgs stuck unclean; 2 pgs undersized; 100 requests are
          blocked > 32

          sec; 1 osds have slow requests; recovery 1197145/653713908
          objects

          degraded (0.183%); recovery 47390082/653713908 objects
          misplaced

          (7.249%); mds0: Behind on trimming (180/30); mds0: Client
          biologin-0

          failing to respond to capability release; mds0: Many clients
          (20)

          failing to respond to cache pressure

          pg 2.1b5 is stuck inactive for 77215.112164, current state

          undersized+degraded+remapped+backfilling+peered, last acting
          [13]

          pg 2.145 is stuck inactive for 76910.328647, current state

          undersized+degraded+remapped+backfilling+peered, last acting
          [13]

          
          If I query, then I dont get a response:

          
          [root@ceph-control ~]# ceph pg 2.1b5 query

          
          Any ideas on what to do?

          
          Dan

          
          _______________________________________________

          ceph-users mailing list

          ceph-users@xxxxxxxxxxxxxx

          http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

        
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com