Re: Broken snapshots... CEPH 0.94.2

Voloshanenko Igor <igor.voloshanenko@xxxxxxxxx> · Fri, 21 Aug 2015 02:19:58 +0300

Our initial values for journal sizes was enough, but flush time was 5 secs, so we increase journal side to fit flush timeframe min|max for 29/30 seconds.
I mean 
  filestore max sync interval = 30
  filestore min sync interval = 29
when said flush time

2015-08-21 2:16 GMT+03:00 Samuel Just <sjust@xxxxxxxxxx>:
Also, what do you mean by "change journal side"?

-Sam

On Thu, Aug 20, 2015 at 4:15 PM, Samuel Just <sjust@xxxxxxxxxx> wrote:

> Not sure what you mean by:

>

> but it's stop to work in same moment, when cache layer fulfilled with

> data and evict/flush started...

> -Sam

>

> On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor

> <igor.voloshanenko@xxxxxxxxx> wrote:

>> No, when we start draining cache - bad pgs was in place...

>> We have big rebalance (disk by disk - to change journal side on both

>> hot/cold layers).. All was Ok, but after 2 days - arrived scrub errors and 2

>> pgs inconsistent...

>>

>> In writeback - yes, looks like snapshot works good. but it's stop to work in

>> same moment, when cache layer fulfilled with data and evict/flush started...

>>

>>

>>

>> 2015-08-21 2:09 GMT+03:00 Samuel Just <sjust@xxxxxxxxxx>:

>>>

>>> So you started draining the cache pool before you saw either the

>>> inconsistent pgs or the anomalous snap behavior?  (That is, writeback

>>> mode was working correctly?)

>>> -Sam

>>>

>>> On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor

>>> <igor.voloshanenko@xxxxxxxxx> wrote:

>>> > Good joke )))))))))

>>> >

>>> > 2015-08-21 2:06 GMT+03:00 Samuel Just <sjust@xxxxxxxxxx>:

>>> >>

>>> >> Certainly, don't reproduce this with a cluster you care about :).

>>> >> -Sam

>>> >>

>>> >> On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just <sjust@xxxxxxxxxx> wrote:

>>> >> > What's supposed to happen is that the client transparently directs

>>> >> > all

>>> >> > requests to the cache pool rather than the cold pool when there is a

>>> >> > cache pool.  If the kernel is sending requests to the cold pool,

>>> >> > that's probably where the bug is.  Odd.  It could also be a bug

>>> >> > specific 'forward' mode either in the client or on the osd.  Why did

>>> >> > you have it in that mode?

>>> >> > -Sam

>>> >> >

>>> >> > On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor

>>> >> > <igor.voloshanenko@xxxxxxxxx> wrote:

>>> >> >> We used 4.x branch, as we have "very good" Samsung 850 pro in

>>> >> >> production,

>>> >> >> and they don;t support ncq_trim...

>>> >> >>

>>> >> >> And 4,x first branch which include exceptions for this in libsata.c.

>>> >> >>

>>> >> >> sure we can backport this 1 line to 3.x branch, but we prefer no to

>>> >> >> go

>>> >> >> deeper if packege for new kernel exist.

>>> >> >>

>>> >> >> 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor

>>> >> >> <igor.voloshanenko@xxxxxxxxx>:

>>> >> >>>

>>> >> >>> root@test:~# uname -a

>>> >> >>> Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17

>>> >> >>> 17:37:22

>>> >> >>> UTC

>>> >> >>> 2015 x86_64 x86_64 x86_64 GNU/Linux

>>> >> >>>

>>> >> >>> 2015-08-21 1:54 GMT+03:00 Samuel Just <sjust@xxxxxxxxxx>:

>>> >> >>>>

>>> >> >>>> Also, can you include the kernel version?

>>> >> >>>> -Sam

>>> >> >>>>

>>> >> >>>> On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just <sjust@xxxxxxxxxx>

>>> >> >>>> wrote:

>>> >> >>>> > Snapshotting with cache/tiering *is* supposed to work.  Can you

>>> >> >>>> > open a

>>> >> >>>> > bug?

>>> >> >>>> > -Sam

>>> >> >>>> >

>>> >> >>>> > On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic

>>> >> >>>> > <andrija.panic@xxxxxxxxx> wrote:

>>> >> >>>> >> This was related to the caching layer, which doesnt support

>>> >> >>>> >> snapshooting per

>>> >> >>>> >> docs...for sake of closing the thread.

>>> >> >>>> >>

>>> >> >>>> >> On 17 August 2015 at 21:15, Voloshanenko Igor

>>> >> >>>> >> <igor.voloshanenko@xxxxxxxxx>

>>> >> >>>> >> wrote:

>>> >> >>>> >>>

>>> >> >>>> >>> Hi all, can you please help me with unexplained situation...

>>> >> >>>> >>>

>>> >> >>>> >>> All snapshot inside ceph broken...

>>> >> >>>> >>>

>>> >> >>>> >>> So, as example, we have VM template, as rbd inside ceph.

>>> >> >>>> >>> We can map it and mount to check that all ok with it

>>> >> >>>> >>>

>>> >> >>>> >>> root@test:~# rbd map

>>> >> >>>> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5

>>> >> >>>> >>> /dev/rbd0

>>> >> >>>> >>> root@test:~# parted /dev/rbd0 print

>>> >> >>>> >>> Model: Unknown (unknown)

>>> >> >>>> >>> Disk /dev/rbd0: 10.7GB

>>> >> >>>> >>> Sector size (logical/physical): 512B/512B

>>> >> >>>> >>> Partition Table: msdos

>>> >> >>>> >>>

>>> >> >>>> >>> Number  Start   End     Size    Type     File system  Flags

>>> >> >>>> >>>  1      1049kB  525MB   524MB   primary  ext4         boot

>>> >> >>>> >>>  2      525MB   10.7GB  10.2GB  primary               lvm

>>> >> >>>> >>>

>>> >> >>>> >>> Than i want to create snap, so i do:

>>> >> >>>> >>> root@test:~# rbd snap create

>>> >> >>>> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap

>>> >> >>>> >>>

>>> >> >>>> >>> And now i want to map it:

>>> >> >>>> >>>

>>> >> >>>> >>> root@test:~# rbd map

>>> >> >>>> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap

>>> >> >>>> >>> /dev/rbd1

>>> >> >>>> >>> root@test:~# parted /dev/rbd1 print

>>> >> >>>> >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file

>>> >> >>>> >>> system).

>>> >> >>>> >>> /dev/rbd1 has been opened read-only.

>>> >> >>>> >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file

>>> >> >>>> >>> system).

>>> >> >>>> >>> /dev/rbd1 has been opened read-only.

>>> >> >>>> >>> Error: /dev/rbd1: unrecognised disk label

>>> >> >>>> >>>

>>> >> >>>> >>> Even md5 different...

>>> >> >>>> >>> root@ix-s2:~# md5sum /dev/rbd0

>>> >> >>>> >>> 9a47797a07fee3a3d71316e22891d752  /dev/rbd0

>>> >> >>>> >>> root@ix-s2:~# md5sum /dev/rbd1

>>> >> >>>> >>> e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1

>>> >> >>>> >>>

>>> >> >>>> >>>

>>> >> >>>> >>> Ok, now i protect snap and create clone... but same thing...

>>> >> >>>> >>> md5 for clone same as for snap,,

>>> >> >>>> >>>

>>> >> >>>> >>> root@test:~# rbd unmap /dev/rbd1

>>> >> >>>> >>> root@test:~# rbd snap protect

>>> >> >>>> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap

>>> >> >>>> >>> root@test:~# rbd clone

>>> >> >>>> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap

>>> >> >>>> >>> cold-storage/test-image

>>> >> >>>> >>> root@test:~# rbd map cold-storage/test-image

>>> >> >>>> >>> /dev/rbd1

>>> >> >>>> >>> root@test:~# md5sum /dev/rbd1

>>> >> >>>> >>> e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1

>>> >> >>>> >>>

>>> >> >>>> >>> .... but it's broken...

>>> >> >>>> >>> root@test:~# parted /dev/rbd1 print

>>> >> >>>> >>> Error: /dev/rbd1: unrecognised disk label

>>> >> >>>> >>>

>>> >> >>>> >>>

>>> >> >>>> >>> =========

>>> >> >>>> >>>

>>> >> >>>> >>> tech details:

>>> >> >>>> >>>

>>> >> >>>> >>> root@test:~# ceph -v

>>> >> >>>> >>> ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)

>>> >> >>>> >>>

>>> >> >>>> >>> We have 2 inconstistent pgs, but all images not placed on this

>>> >> >>>> >>> pgs...

>>> >> >>>> >>>

>>> >> >>>> >>> root@test:~# ceph health detail

>>> >> >>>> >>> HEALTH_ERR 2 pgs inconsistent; 18 scrub errors

>>> >> >>>> >>> pg 2.490 is active+clean+inconsistent, acting [56,15,29]

>>> >> >>>> >>> pg 2.c4 is active+clean+inconsistent, acting [56,10,42]

>>> >> >>>> >>> 18 scrub errors

>>> >> >>>> >>>

>>> >> >>>> >>> ============

>>> >> >>>> >>>

>>> >> >>>> >>> root@test:~# ceph osd map cold-storage

>>> >> >>>> >>> 0e23c701-401d-4465-b9b4-c02939d57bb5

>>> >> >>>> >>> osdmap e16770 pool 'cold-storage' (2) object

>>> >> >>>> >>> '0e23c701-401d-4465-b9b4-c02939d57bb5' -> pg 2.74458f70

>>> >> >>>> >>> (2.770)

>>> >> >>>> >>> -> up

>>> >> >>>> >>> ([37,15,14], p37) acting ([37,15,14], p37)

>>> >> >>>> >>> root@test:~# ceph osd map cold-storage

>>> >> >>>> >>> 0e23c701-401d-4465-b9b4-c02939d57bb5@snap

>>> >> >>>> >>> osdmap e16770 pool 'cold-storage' (2) object

>>> >> >>>> >>> '0e23c701-401d-4465-b9b4-c02939d57bb5@snap' -> pg 2.793cd4a3

>>> >> >>>> >>> (2.4a3)

>>> >> >>>> >>> -> up

>>> >> >>>> >>> ([12,23,17], p12) acting ([12,23,17], p12)

>>> >> >>>> >>> root@test:~# ceph osd map cold-storage

>>> >> >>>> >>> 0e23c701-401d-4465-b9b4-c02939d57bb5@test-image

>>> >> >>>> >>> osdmap e16770 pool 'cold-storage' (2) object

>>> >> >>>> >>> '0e23c701-401d-4465-b9b4-c02939d57bb5@test-image' -> pg

>>> >> >>>> >>> 2.9519c2a9

>>> >> >>>> >>> (2.2a9)

>>> >> >>>> >>> -> up ([12,44,23], p12) acting ([12,44,23], p12)

>>> >> >>>> >>>

>>> >> >>>> >>>

>>> >> >>>> >>> Also we use cache layer, which in current moment - in forward

>>> >> >>>> >>> mode...

>>> >> >>>> >>>

>>> >> >>>> >>> Can you please help me with this.. As my brain stop to

>>> >> >>>> >>> understand

>>> >> >>>> >>> what is

>>> >> >>>> >>> going on...

>>> >> >>>> >>>

>>> >> >>>> >>> Thank in advance!

>>> >> >>>> >>>

>>> >> >>>> >>>

>>> >> >>>> >>>

>>> >> >>>> >>>

>>> >> >>>> >>>

>>> >> >>>> >>> _______________________________________________

>>> >> >>>> >>> ceph-users mailing list

>>> >> >>>> >>> ceph-users@xxxxxxxxxxxxxx

>>> >> >>>> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>>> >> >>>> >>>

>>> >> >>>> >>

>>> >> >>>> >>

>>> >> >>>> >>

>>> >> >>>> >> --

>>> >> >>>> >>

>>> >> >>>> >> Andrija Panić

>>> >> >>>> >>

>>> >> >>>> >> _______________________________________________

>>> >> >>>> >> ceph-users mailing list

>>> >> >>>> >> ceph-users@xxxxxxxxxxxxxx

>>> >> >>>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>>> >> >>>> >>

>>> >> >>>

>>> >> >>>

>>> >> >>

>>> >

>>> >

>>

>>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com