Re: Data moved pools but didn't move osds & backfilling+remapped loop

Marco Stuurman <marcostuurman1994@xxxxxxxxx> · Thu, 9 May 2019 12:49:45 +0200

Hi Gregory,
Sorry, I was sure I mentioned it. We installed as Luminous, upgraded to Mimic and this happend on Nautilus. (14.2.0)

The data was moving until the fasthdds pool1 was "empty". The PG's do not migrate, it's going up to 377 active+clean and then the following log appears:

in ceph -w:
2019-05-07 14:50:57.200689 mon.CEPH-MON-1 [WRN] Health check failed: Reduced data availability: 3 pgs inactive, 5 pgs peering (PG_AVAILABILITY)
2019-05-07 14:50:58.217714 osd.6 [INF] 48.7f continuing backfill to osd.11 from (543'1437,568'4437] 48:fe754016:::rbd_data.430896b8b4567.000000000004f1eb:head to 568'4437
2019-05-07 14:51:02.692241 mon.CEPH-MON-1 [WRN] Health check update: Reduced data availability: 7 pgs peering (PG_AVAILABILITY)
2019-05-07 14:51:07.130563 mon.CEPH-MON-1 [INF] Health check cleared: PG_AVAILABILITY (was: Reduced data availability: 2 pgs peering)

in osd logs:
2019-05-07 15:07:59.148 7f07be074700  1 osd.10 pg_epoch: 1379 pg[48.183( v 567'3429 (542'400,567'3429] local-lis/les=533/534 n=998 ec=533/504 lis/c 533/533 les/c/f 534/534/0 1376/1379/1376) [10,6,8]/[10,6,11] r=0 lpr=1379 pi=[533,1379)/2 crt=567'3429 lcod 567'3428 mlcod 0'0 remapped mbc={}] start_peering_interval up [10,6,8] -> [10,6,8], acting [10,6,8] -> [10,6,11], acting_primary 10 -> 10, up_primary 10 -> 10, role 0 -> 0, features acting 4611087854031667199 upacting 4611087854031667199
2019-05-07 15:07:59.148 7f07be875700  1 osd.10 pg_epoch: 1379 pg[48.83( v 567'4683 (545'1600,567'4683] local-lis/les=533/534 n=1001 ec=533/504 lis/c 533/533 les/c/f 534/534/0 1376/1379/1376) [10,6,8]/[10,6,11] r=0 lpr=1379 pi=[533,1379)/2 crt=567'4683 lcod 562'4682 mlcod 0'0 remapped mbc={}] start_peering_interval up [10,6,8] -> [10,6,8], acting [10,6,8] -> [10,6,11], acting_primary 10 -> 10, up_primary 10 -> 10, role 0 -> 0, features acting 4611087854031667199 upacting 4611087854031667199
2019-05-07 15:07:59.148 7f07be074700  1 osd.10 pg_epoch: 1379 pg[48.183( v 567'3429 (542'400,567'3429] local-lis/les=533/534 n=998 ec=533/504 lis/c 533/533 les/c/f 534/534/0 1376/1379/1376) [10,6,8]/[10,6,11] r=0 lpr=1379 pi=[533,1379)/2 crt=567'3429 lcod 567'3428 mlcod 0'0 remapped mbc={}] state<Start>: transitioning to Primary
2019-05-07 15:07:59.148 7f07be875700  1 osd.10 pg_epoch: 1379 pg[48.83( v 567'4683 (545'1600,567'4683] local-lis/les=533/534 n=1001 ec=533/504 lis/c 533/533 les/c/f 534/534/0 1376/1379/1376) [10,6,8]/[10,6,11] r=0 lpr=1379 pi=[533,1379)/2 crt=567'4683 lcod 562'4682 mlcod 0'0 remapped mbc={}] state<Start>: transitioning to Primary

Note that those 2 happen at the same time, I took the parts of the logs from separate moments it fell back to 361 PG's active+clean.

Kind regards,

Marco Stuurman

Op wo 8 mei 2019 om 22:50 schreef Gregory Farnum <gfarnum@xxxxxxxxxx>:
On Wed, May 8, 2019 at 2:37 AM Marco Stuurman

<marcostuurman1994@xxxxxxxxx> wrote:

>

> Hi,

>

> I've got an issue with the data in our pool. A RBD image containing 4TB+ data has moved over to a different pool after a crush rule set change, which should not be possible. Besides that it loops over and over to start remapping and backfilling (goes up to 377 pg active+clean then suddenly drops to 361, without crashes accourding to ceph -w & ceph crash ls)

>

> First about the pools:

>

> [root@CEPH-MGMT-1 ~t]# ceph df

> RAW STORAGE:

>     CLASS        SIZE       AVAIL      USED        RAW USED     %RAW USED

>     cheaphdd     16 TiB     10 TiB     5.9 TiB      5.9 TiB         36.08

>     fasthdd      33 TiB     18 TiB      16 TiB       16 TiB         47.07

>     TOTAL        50 TiB     28 TiB      22 TiB       22 TiB         43.44

>

> POOLS:

>     POOL             ID     STORED      OBJECTS     USED         %USED     MAX AVAIL

>     pool1              37       780 B            1.33M          780 B               0           3.4 TiB

>     pool2              48     2.0 TiB           510.57k        5.9 TiB          42.64       2.6 TiB

>

> All data is now in pool2 while the RBD image is created in pool1 (since pool2 is new).

>

> The steps it took to make ceph do this is:

>

> - Add osds with a different device class (class cheaphdd)

> - Create crushruleset for cheaphdd only called cheapdisks

> - Create pool2 with new crush rule set

> - Remove device class from the previously existing devices (remove class hdd)

> - Add class fasthdd to those devices

> - Create new crushruleset fastdisks

> - Change crushruleset for pool1 to fastdisks

>

> After this the data starts moving everything from pool1 to pool2, however, the RBD image still works and the disks of pool1 are still filled with data.

>

> I've tried to reproduce this issue using virtual machines but I couldn't make it happen again.

>

> Some extra information:

> ceph osd crush tree --show-shadow ==> https://fe.ax/639aa.H34539.txt

> ceph pg ls-by-pool pool1 ==> https://fe.ax/dcacd.H44900.txt (I know the PG count is too low)

> ceph pg ls-by-pool pool2 ==> https://fe.ax/95a2c.H51533.txt

> ceph -s ==> https://fe.ax/aab41.H69711.txt

>

>

> Can someone shine a light on why the data looks like it's moved to another pool and/or explain why the data in pool2 keeps remapping/backfilling in a loop?

What version of Ceph are you running? Are the PGs active+clean

changing in any other way?

My guess is this is just the reporting getting messed up because none

of the cheaphdd disks are supposed to be reachable by pool1 now, and

so their disk usage is being assigned to pool2. In which case it will

clear up once all the data movement is done.

Can you confirm if it's getting better as PGs actually migrate?

>

> Thanks!

>

>

> Kind regards,

>

> Marco Stuurman

> _______________________________________________

> ceph-users mailing list

> ceph-users@xxxxxxxxxxxxxx

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com