Re: Node crash, filesytem not usable

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



 
In luminous 
osd_recovery_threads = osd_disk_threads ?
osd_recovery_sleep = osd_recovery_sleep_hdd ?

Or is this speeding up recovery, a lot different in luminous?

[@~]# ceph daemon osd.0 config show | grep osd | grep thread
    "osd_command_thread_suicide_timeout": "900",
    "osd_command_thread_timeout": "600",
    "osd_disk_thread_ioprio_class": "",
    "osd_disk_thread_ioprio_priority": "-1",
    "osd_disk_threads": "1",
    "osd_op_num_threads_per_shard": "0",
    "osd_op_num_threads_per_shard_hdd": "1",
    "osd_op_num_threads_per_shard_ssd": "2",
    "osd_op_thread_suicide_timeout": "150",
    "osd_op_thread_timeout": "15",
    "osd_peering_wq_threads": "2",
    "osd_recovery_thread_suicide_timeout": "300",
    "osd_recovery_thread_timeout": "30",
    "osd_remove_thread_suicide_timeout": "36000",
    "osd_remove_thread_timeout": "3600",

-----Original Message-----
From: Webert de Souza Lima [mailto:webert.boss@xxxxxxxxx] 
Sent: vrijdag 11 mei 2018 20:34
To: ceph-users
Subject: Re:  Node crash, filesytem not usable

This message seems to be very concerning:
 >            mds0: Metadata damage detected


but for the rest, the cluster seems still to be recovering. you could 
try to seep thing up with ceph tell, like:

ceph tell osd.* injectargs --osd_max_backfills=10

ceph tell osd.* injectargs --osd_recovery_sleep=0.0

ceph tell osd.* injectargs --osd_recovery_threads=2



Regards,

Webert Lima
DevOps Engineer at MAV Tecnologia
Belo Horizonte - Brasil
IRC NICK - WebertRLZ


On Fri, May 11, 2018 at 3:06 PM Daniel Davidson 
<danield@xxxxxxxxxxxxxxxx> wrote:


	Below id the information you were asking for.  I think they are 
size=2, min size=1. 
	
	Dan
	
	# ceph status
	    cluster 7bffce86-9d7b-4bdf-a9c9-67670e68ca77                    
                                                                         
                                                                         
          
	     health HEALTH_ERR                                              
                                                                         
                                                                         
          
	            140 pgs are stuck inactive for more than 300 seconds
	            64 pgs backfill_wait
	            76 pgs backfilling
	            140 pgs degraded
	            140 pgs stuck degraded
	            140 pgs stuck inactive
	            140 pgs stuck unclean
	            140 pgs stuck undersized
	            140 pgs undersized
	            210 requests are blocked > 32 sec
	            recovery 38725029/695508092 objects degraded (5.568%)
	            recovery 10844554/695508092 objects misplaced (1.559%)
	            mds0: Metadata damage detected
	            mds0: Behind on trimming (71/30)
	            noscrub,nodeep-scrub flag(s) set
	     monmap e3: 4 mons at 
{ceph-0=172.16.31.1:6789/0,ceph-1=172.16.31.2:6789/0,ceph-2=172.16.31.3:
6789/0,ceph-3=172.16.31.4:6789/0}
	            election epoch 824, quorum 0,1,2,3 
ceph-0,ceph-1,ceph-2,ceph-3
	      fsmap e144928: 1/1/1 up {0=ceph-0=up:active}, 1 up:standby
	     osdmap e35814: 32 osds: 30 up, 30 in; 140 remapped pgs
	            flags 
noscrub,nodeep-scrub,sortbitwise,require_jewel_osds
	      pgmap v43142427: 1536 pgs, 2 pools, 762 TB data, 331 Mobjects
	            1444 TB used, 1011 TB / 2455 TB avail
	            38725029/695508092 objects degraded (5.568%)
	            10844554/695508092 objects misplaced (1.559%)
	                1396 active+clean
	                  76 
undersized+degraded+remapped+backfilling+peered
	                  64 
undersized+degraded+remapped+wait_backfill+peered
	recovery io 1244 MB/s, 1612 keys/s, 705 objects/s
	
	ID  WEIGHT     TYPE NAME        UP/DOWN REWEIGHT PRIMARY-AFFINITY 
	 -1 2619.54541 root default                                       
	 -2  163.72159     host ceph-0                                    
	  0   81.86079         osd.0         up  1.00000          1.00000 
	  1   81.86079         osd.1         up  1.00000          1.00000 
	 -3  163.72159     host ceph-1                                    
	  2   81.86079         osd.2         up  1.00000          1.00000 
	  3   81.86079         osd.3         up  1.00000          1.00000 
	 -4  163.72159     host ceph-2                                    
	  8   81.86079         osd.8         up  1.00000          1.00000 
	  9   81.86079         osd.9         up  1.00000          1.00000 
	 -5  163.72159     host ceph-3                                    
	 10   81.86079         osd.10        up  1.00000          1.00000 
	 11   81.86079         osd.11        up  1.00000          1.00000 
	 -6  163.72159     host ceph-4                                    
	  4   81.86079         osd.4         up  1.00000          1.00000 
	  5   81.86079         osd.5         up  1.00000          1.00000 
	 -7  163.72159     host ceph-5                                    
	  6   81.86079         osd.6         up  1.00000          1.00000 
	  7   81.86079         osd.7         up  1.00000          1.00000 
	 -8  163.72159     host ceph-6                                    
	 12   81.86079         osd.12        up  0.79999          1.00000 
	 13   81.86079         osd.13        up  1.00000          1.00000 
	 -9  163.72159     host ceph-7                                    
	 14   81.86079         osd.14        up  1.00000          1.00000 
	 15   81.86079         osd.15        up  1.00000          1.00000 
	-10  163.72159     host ceph-8                                    
	 16   81.86079         osd.16        up  1.00000          1.00000 
	 17   81.86079         osd.17        up  1.00000          1.00000 
	-11  163.72159     host ceph-9                                    
	 18   81.86079         osd.18        up  1.00000          1.00000 
	 19   81.86079         osd.19        up  1.00000          1.00000 
	-12  163.72159     host ceph-10                                   
	 20   81.86079         osd.20        up  1.00000          1.00000 
	 21   81.86079         osd.21        up  1.00000          1.00000 
	-13  163.72159     host ceph-11                                   
	 22   81.86079         osd.22        up  1.00000          1.00000 
	 23   81.86079         osd.23        up  1.00000          1.00000 
	-14  163.72159     host ceph-12                                   
	 24   81.86079         osd.24        up  1.00000          1.00000 
	 25   81.86079         osd.25        up  1.00000          1.00000 
	-15  163.72159     host ceph-13                                   
	 26   81.86079         osd.26      down        0          1.00000 
	 27   81.86079         osd.27      down        0          1.00000 
	-16  163.72159     host ceph-14                                   
	 28   81.86079         osd.28        up  1.00000          1.00000 
	 29   81.86079         osd.29        up  1.00000          1.00000 
	-17  163.72159     host ceph-15                                   
	 30   81.86079         osd.30        up  1.00000          1.00000 
	 31   81.86079         osd.31        up  1.00000          1.00000 
	
	
	
	On 05/11/2018 11:56 AM, David Turner wrote:
	

		What are some outputs of commands to show us the state of your 
cluster.  Most notable is `ceph status` but `ceph osd tree` would be 
helpful. What are the size of the pools in your cluster?  Are they all 
size=3 min_size=2?

		On Fri, May 11, 2018 at 12:05 PM Daniel Davidson 
<danield@xxxxxxxxxxxxxxxx> wrote:
		

			Hello,
			
			Today we had a node crash, and looking at it, it seems 
there is a 
			problem with the RAID controller, so it is not coming 
back up, maybe 
			ever.  It corrupted the local filesytem for the ceph 
storage there.
			
			The remainder of our storage (10.2.10) cluster is 
running, and it looks 
			to be repairing and our min_size is set to 2.  Normally, 
I would expect 
			that the system would keep running normally from and end 
user 
			perspective when this happens, but the system is down. 
All mounts that 
			were up when this started look to be stale, and new 
mounts give the 
			following error:
			
			# mount -t ceph ceph-0:/ /test/ -o 
			
name=admin,secretfile=/etc/ceph/admin.secret,noatime,_netdev,rbytes
			mount error 5 = Input/output error
			
			Any suggestions?
			
			Dan
			
			_______________________________________________
			ceph-users mailing list
			ceph-users@xxxxxxxxxxxxxx
			http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
			

	
	

	_______________________________________________
	ceph-users mailing list
	ceph-users@xxxxxxxxxxxxxx
	http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
	


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux