Re: Broken snapshots... CEPH 0.94.2

Voloshanenko Igor <igor.voloshanenko@xxxxxxxxx> · Fri, 21 Aug 2015 02:18:25 +0300

WE haven't set values for max_bytes / max_objects.. and all data initially writes only to cache layer and not flushed at all to cold layer.
Then we received notification from monitoring that we collect about 750GB in hot pool ) So i changed values for max_object_bytes to be 0,9 of disk size... And then evicting/flushing started...

And issue with snapshots arrived

2015-08-21 2:15 GMT+03:00 Samuel Just <sjust@xxxxxxxxxx>:
Not sure what you mean by:

but it's stop to work in same moment, when cache layer fulfilled with

data and evict/flush started...

-Sam

On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor

<igor.voloshanenko@xxxxxxxxx> wrote:

> No, when we start draining cache - bad pgs was in place...

> We have big rebalance (disk by disk - to change journal side on both

> hot/cold layers).. All was Ok, but after 2 days - arrived scrub errors and 2

> pgs inconsistent...

>

> In writeback - yes, looks like snapshot works good. but it's stop to work in

> same moment, when cache layer fulfilled with data and evict/flush started...

>

>

>

> 2015-08-21 2:09 GMT+03:00 Samuel Just <sjust@xxxxxxxxxx>:

>>

>> So you started draining the cache pool before you saw either the

>> inconsistent pgs or the anomalous snap behavior?  (That is, writeback

>> mode was working correctly?)

>> -Sam

>>

>> On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor

>> <igor.voloshanenko@xxxxxxxxx> wrote:

>> > Good joke )))))))))

>> >

>> > 2015-08-21 2:06 GMT+03:00 Samuel Just <sjust@xxxxxxxxxx>:

>> >>

>> >> Certainly, don't reproduce this with a cluster you care about :).

>> >> -Sam

>> >>

>> >> On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just <sjust@xxxxxxxxxx> wrote:

>> >> > What's supposed to happen is that the client transparently directs

>> >> > all

>> >> > requests to the cache pool rather than the cold pool when there is a

>> >> > cache pool.  If the kernel is sending requests to the cold pool,

>> >> > that's probably where the bug is.  Odd.  It could also be a bug

>> >> > specific 'forward' mode either in the client or on the osd.  Why did

>> >> > you have it in that mode?

>> >> > -Sam

>> >> >

>> >> > On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor

>> >> > <igor.voloshanenko@xxxxxxxxx> wrote:

>> >> >> We used 4.x branch, as we have "very good" Samsung 850 pro in

>> >> >> production,

>> >> >> and they don;t support ncq_trim...

>> >> >>

>> >> >> And 4,x first branch which include exceptions for this in libsata.c.

>> >> >>

>> >> >> sure we can backport this 1 line to 3.x branch, but we prefer no to

>> >> >> go

>> >> >> deeper if packege for new kernel exist.

>> >> >>

>> >> >> 2015-08-21 1:56 GMT+03:00 Voloshanenko Igor

>> >> >> <igor.voloshanenko@xxxxxxxxx>:

>> >> >>>

>> >> >>> root@test:~# uname -a

>> >> >>> Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17

>> >> >>> 17:37:22

>> >> >>> UTC

>> >> >>> 2015 x86_64 x86_64 x86_64 GNU/Linux

>> >> >>>

>> >> >>> 2015-08-21 1:54 GMT+03:00 Samuel Just <sjust@xxxxxxxxxx>:

>> >> >>>>

>> >> >>>> Also, can you include the kernel version?

>> >> >>>> -Sam

>> >> >>>>

>> >> >>>> On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just <sjust@xxxxxxxxxx>

>> >> >>>> wrote:

>> >> >>>> > Snapshotting with cache/tiering *is* supposed to work.  Can you

>> >> >>>> > open a

>> >> >>>> > bug?

>> >> >>>> > -Sam

>> >> >>>> >

>> >> >>>> > On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic

>> >> >>>> > <andrija.panic@xxxxxxxxx> wrote:

>> >> >>>> >> This was related to the caching layer, which doesnt support

>> >> >>>> >> snapshooting per

>> >> >>>> >> docs...for sake of closing the thread.

>> >> >>>> >>

>> >> >>>> >> On 17 August 2015 at 21:15, Voloshanenko Igor

>> >> >>>> >> <igor.voloshanenko@xxxxxxxxx>

>> >> >>>> >> wrote:

>> >> >>>> >>>

>> >> >>>> >>> Hi all, can you please help me with unexplained situation...

>> >> >>>> >>>

>> >> >>>> >>> All snapshot inside ceph broken...

>> >> >>>> >>>

>> >> >>>> >>> So, as example, we have VM template, as rbd inside ceph.

>> >> >>>> >>> We can map it and mount to check that all ok with it

>> >> >>>> >>>

>> >> >>>> >>> root@test:~# rbd map

>> >> >>>> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5

>> >> >>>> >>> /dev/rbd0

>> >> >>>> >>> root@test:~# parted /dev/rbd0 print

>> >> >>>> >>> Model: Unknown (unknown)

>> >> >>>> >>> Disk /dev/rbd0: 10.7GB

>> >> >>>> >>> Sector size (logical/physical): 512B/512B

>> >> >>>> >>> Partition Table: msdos

>> >> >>>> >>>

>> >> >>>> >>> Number  Start   End     Size    Type     File system  Flags

>> >> >>>> >>>  1      1049kB  525MB   524MB   primary  ext4         boot

>> >> >>>> >>>  2      525MB   10.7GB  10.2GB  primary               lvm

>> >> >>>> >>>

>> >> >>>> >>> Than i want to create snap, so i do:

>> >> >>>> >>> root@test:~# rbd snap create

>> >> >>>> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap

>> >> >>>> >>>

>> >> >>>> >>> And now i want to map it:

>> >> >>>> >>>

>> >> >>>> >>> root@test:~# rbd map

>> >> >>>> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap

>> >> >>>> >>> /dev/rbd1

>> >> >>>> >>> root@test:~# parted /dev/rbd1 print

>> >> >>>> >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file

>> >> >>>> >>> system).

>> >> >>>> >>> /dev/rbd1 has been opened read-only.

>> >> >>>> >>> Warning: Unable to open /dev/rbd1 read-write (Read-only file

>> >> >>>> >>> system).

>> >> >>>> >>> /dev/rbd1 has been opened read-only.

>> >> >>>> >>> Error: /dev/rbd1: unrecognised disk label

>> >> >>>> >>>

>> >> >>>> >>> Even md5 different...

>> >> >>>> >>> root@ix-s2:~# md5sum /dev/rbd0

>> >> >>>> >>> 9a47797a07fee3a3d71316e22891d752  /dev/rbd0

>> >> >>>> >>> root@ix-s2:~# md5sum /dev/rbd1

>> >> >>>> >>> e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1

>> >> >>>> >>>

>> >> >>>> >>>

>> >> >>>> >>> Ok, now i protect snap and create clone... but same thing...

>> >> >>>> >>> md5 for clone same as for snap,,

>> >> >>>> >>>

>> >> >>>> >>> root@test:~# rbd unmap /dev/rbd1

>> >> >>>> >>> root@test:~# rbd snap protect

>> >> >>>> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap

>> >> >>>> >>> root@test:~# rbd clone

>> >> >>>> >>> cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap

>> >> >>>> >>> cold-storage/test-image

>> >> >>>> >>> root@test:~# rbd map cold-storage/test-image

>> >> >>>> >>> /dev/rbd1

>> >> >>>> >>> root@test:~# md5sum /dev/rbd1

>> >> >>>> >>> e450f50b9ffa0073fae940ee858a43ce  /dev/rbd1

>> >> >>>> >>>

>> >> >>>> >>> .... but it's broken...

>> >> >>>> >>> root@test:~# parted /dev/rbd1 print

>> >> >>>> >>> Error: /dev/rbd1: unrecognised disk label

>> >> >>>> >>>

>> >> >>>> >>>

>> >> >>>> >>> =========

>> >> >>>> >>>

>> >> >>>> >>> tech details:

>> >> >>>> >>>

>> >> >>>> >>> root@test:~# ceph -v

>> >> >>>> >>> ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)

>> >> >>>> >>>

>> >> >>>> >>> We have 2 inconstistent pgs, but all images not placed on this

>> >> >>>> >>> pgs...

>> >> >>>> >>>

>> >> >>>> >>> root@test:~# ceph health detail

>> >> >>>> >>> HEALTH_ERR 2 pgs inconsistent; 18 scrub errors

>> >> >>>> >>> pg 2.490 is active+clean+inconsistent, acting [56,15,29]

>> >> >>>> >>> pg 2.c4 is active+clean+inconsistent, acting [56,10,42]

>> >> >>>> >>> 18 scrub errors

>> >> >>>> >>>

>> >> >>>> >>> ============

>> >> >>>> >>>

>> >> >>>> >>> root@test:~# ceph osd map cold-storage

>> >> >>>> >>> 0e23c701-401d-4465-b9b4-c02939d57bb5

>> >> >>>> >>> osdmap e16770 pool 'cold-storage' (2) object

>> >> >>>> >>> '0e23c701-401d-4465-b9b4-c02939d57bb5' -> pg 2.74458f70

>> >> >>>> >>> (2.770)

>> >> >>>> >>> -> up

>> >> >>>> >>> ([37,15,14], p37) acting ([37,15,14], p37)

>> >> >>>> >>> root@test:~# ceph osd map cold-storage

>> >> >>>> >>> 0e23c701-401d-4465-b9b4-c02939d57bb5@snap

>> >> >>>> >>> osdmap e16770 pool 'cold-storage' (2) object

>> >> >>>> >>> '0e23c701-401d-4465-b9b4-c02939d57bb5@snap' -> pg 2.793cd4a3

>> >> >>>> >>> (2.4a3)

>> >> >>>> >>> -> up

>> >> >>>> >>> ([12,23,17], p12) acting ([12,23,17], p12)

>> >> >>>> >>> root@test:~# ceph osd map cold-storage

>> >> >>>> >>> 0e23c701-401d-4465-b9b4-c02939d57bb5@test-image

>> >> >>>> >>> osdmap e16770 pool 'cold-storage' (2) object

>> >> >>>> >>> '0e23c701-401d-4465-b9b4-c02939d57bb5@test-image' -> pg

>> >> >>>> >>> 2.9519c2a9

>> >> >>>> >>> (2.2a9)

>> >> >>>> >>> -> up ([12,44,23], p12) acting ([12,44,23], p12)

>> >> >>>> >>>

>> >> >>>> >>>

>> >> >>>> >>> Also we use cache layer, which in current moment - in forward

>> >> >>>> >>> mode...

>> >> >>>> >>>

>> >> >>>> >>> Can you please help me with this.. As my brain stop to

>> >> >>>> >>> understand

>> >> >>>> >>> what is

>> >> >>>> >>> going on...

>> >> >>>> >>>

>> >> >>>> >>> Thank in advance!

>> >> >>>> >>>

>> >> >>>> >>>

>> >> >>>> >>>

>> >> >>>> >>>

>> >> >>>> >>>

>> >> >>>> >>> _______________________________________________

>> >> >>>> >>> ceph-users mailing list

>> >> >>>> >>> ceph-users@xxxxxxxxxxxxxx

>> >> >>>> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>> >> >>>> >>>

>> >> >>>> >>

>> >> >>>> >>

>> >> >>>> >>

>> >> >>>> >> --

>> >> >>>> >>

>> >> >>>> >> Andrija Panić

>> >> >>>> >>

>> >> >>>> >> _______________________________________________

>> >> >>>> >> ceph-users mailing list

>> >> >>>> >> ceph-users@xxxxxxxxxxxxxx

>> >> >>>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>> >> >>>> >>

>> >> >>>

>> >> >>>

>> >> >>

>> >

>> >

>

>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com