Re: Scrub shutdown the OSD process / data loss

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Le samedi 20 avril 2013 à 09:10 +0200, Olivier Bonvalet a écrit :
> Le mercredi 17 avril 2013 à 20:52 +0200, Olivier Bonvalet a écrit :
> > What I didn't understand is why the OSD process crash, instead of
> > marking that PG "corrupted", and does that PG really "corrupted" are
> > is
> > this just an OSD bug ?
> 
> Once again, a bit more informations : by searching informations about
> one of this faulty PG (3.d), I found that :
> 
>   -592> 2013-04-20 08:31:56.838280 7f0f41d1b700  0 log [ERR] : 3.d osd.25 inconsistent snapcolls on a8620b0d/rb.0.15c26.238e1f29.000000004603/12d7//3 found  expected 12d7
>   -591> 2013-04-20 08:31:56.838284 7f0f41d1b700  0 log [ERR] : 3.d osd.4 inconsistent snapcolls on a8620b0d/rb.0.15c26.238e1f29.000000004603/12d7//3 found  expected 12d7
>   -590> 2013-04-20 08:31:56.838290 7f0f41d1b700  0 log [ERR] : 3.d osd.4: soid a8620b0d/rb.0.15c26.238e1f29.000000004603/12d7//3 size 4194304 != known size 0
>   -589> 2013-04-20 08:31:56.838292 7f0f41d1b700  0 log [ERR] : 3.d osd.11 inconsistent snapcolls on a8620b0d/rb.0.15c26.238e1f29.000000004603/12d7//3 found  expected 12d7
>   -588> 2013-04-20 08:31:56.838294 7f0f41d1b700  0 log [ERR] : 3.d osd.11: soid a8620b0d/rb.0.15c26.238e1f29.000000004603/12d7//3 size 4194304 != known size 0
>   -587> 2013-04-20 08:31:56.838395 7f0f41d1b700  0 log [ERR] : scrub 3.d a8620b0d/rb.0.15c26.238e1f29.000000004603/12d7//3 on disk size (0) does not match object info size (4194304)
> 
> 
> I prefered to verify, so I found that :
> 
> # md5sum /var/lib/ceph/osd/ceph-*/current/3.d_head/DIR_D/DIR_0/DIR_B/DIR_0/rb.0.15c26.238e1f29.000000004603__12d7_A8620B0D__3
> 217ac2518dfe9e1502e5bfedb8be29b8  /var/lib/ceph/osd/ceph-4/current/3.d_head/DIR_D/DIR_0/DIR_B/DIR_0/rb.0.15c26.238e1f29.000000004603__12d7_A8620B0D__3 (4MB)
> 217ac2518dfe9e1502e5bfedb8be29b8  /var/lib/ceph/osd/ceph-11/current/3.d_head/DIR_D/DIR_0/DIR_B/DIR_0/rb.0.15c26.238e1f29.000000004603__12d7_A8620B0D__3 (4MB)
> d41d8cd98f00b204e9800998ecf8427e  /var/lib/ceph/osd/ceph-25/current/3.d_head/DIR_D/DIR_0/DIR_B/DIR_0/rb.0.15c26.238e1f29.000000004603__12d7_A8620B0D__3 (0B)
> 
> 
> So this object is identical on OSD 4 and 11, but is empty on OSD 25.
> Since 4 is the master, this should not be a problem, so I try a repair,
> without any success :
>     ceph pg repair 3.d
> 
> 
> Is there a way to force rewrite of this replica ?
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Don't know if it's related, but I see data loss on my cluster on
multiple RBD images (corrupted FS, database and some empty files).

I suppose It's related.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux