Hi Gregory,
Sorry, I was sure I mentioned it. We installed as Luminous, upgraded to Mimic and this happend on Nautilus. (14.2.0)
The data was moving until the fasthdds pool1 was "empty". The PG's do not migrate, it's going up to 377 active+clean and then the following log appears:
in ceph -w:
2019-05-07 14:50:57.200689 mon.CEPH-MON-1 [WRN] Health check failed: Reduced data availability: 3 pgs inactive, 5 pgs peering (PG_AVAILABILITY)
2019-05-07 14:50:58.217714 osd.6 [INF] 48.7f continuing backfill to osd.11 from (543'1437,568'4437] 48:fe754016:::rbd_data.430896b8b4567.000000000004f1eb:head to 568'4437
2019-05-07 14:51:02.692241 mon.CEPH-MON-1 [WRN] Health check update: Reduced data availability: 7 pgs peering (PG_AVAILABILITY)
2019-05-07 14:51:07.130563 mon.CEPH-MON-1 [INF] Health check cleared: PG_AVAILABILITY (was: Reduced data availability: 2 pgs peering)
in osd logs:
2019-05-07 15:07:59.148 7f07be074700 1 osd.10 pg_epoch: 1379 pg[48.183( v 567'3429 (542'400,567'3429] local-lis/les=533/534 n=998 ec=533/504 lis/c 533/533 les/c/f 534/534/0 1376/1379/1376) [10,6,8]/[10,6,11] r=0 lpr=1379 pi=[533,1379)/2 crt=567'3429 lcod 567'3428 mlcod 0'0 remapped mbc={}] start_peering_interval up [10,6,8] -> [10,6,8], acting [10,6,8] -> [10,6,11], acting_primary 10 -> 10, up_primary 10 -> 10, role 0 -> 0, features acting 4611087854031667199 upacting 4611087854031667199
2019-05-07 15:07:59.148 7f07be875700 1 osd.10 pg_epoch: 1379 pg[48.83( v 567'4683 (545'1600,567'4683] local-lis/les=533/534 n=1001 ec=533/504 lis/c 533/533 les/c/f 534/534/0 1376/1379/1376) [10,6,8]/[10,6,11] r=0 lpr=1379 pi=[533,1379)/2 crt=567'4683 lcod 562'4682 mlcod 0'0 remapped mbc={}] start_peering_interval up [10,6,8] -> [10,6,8], acting [10,6,8] -> [10,6,11], acting_primary 10 -> 10, up_primary 10 -> 10, role 0 -> 0, features acting 4611087854031667199 upacting 4611087854031667199
2019-05-07 15:07:59.148 7f07be074700 1 osd.10 pg_epoch: 1379 pg[48.183( v 567'3429 (542'400,567'3429] local-lis/les=533/534 n=998 ec=533/504 lis/c 533/533 les/c/f 534/534/0 1376/1379/1376) [10,6,8]/[10,6,11] r=0 lpr=1379 pi=[533,1379)/2 crt=567'3429 lcod 567'3428 mlcod 0'0 remapped mbc={}] state<Start>: transitioning to Primary
2019-05-07 15:07:59.148 7f07be875700 1 osd.10 pg_epoch: 1379 pg[48.83( v 567'4683 (545'1600,567'4683] local-lis/les=533/534 n=1001 ec=533/504 lis/c 533/533 les/c/f 534/534/0 1376/1379/1376) [10,6,8]/[10,6,11] r=0 lpr=1379 pi=[533,1379)/2 crt=567'4683 lcod 562'4682 mlcod 0'0 remapped mbc={}] state<Start>: transitioning to Primary
Note that those 2 happen at the same time, I took the parts of the logs from separate moments it fell back to 361 PG's active+clean.
Kind regards,
Marco Stuurman
Op wo 8 mei 2019 om 22:50 schreef Gregory Farnum <gfarnum@xxxxxxxxxx>:
On Wed, May 8, 2019 at 2:37 AM Marco Stuurman
<marcostuurman1994@xxxxxxxxx> wrote:
>
> Hi,
>
> I've got an issue with the data in our pool. A RBD image containing 4TB+ data has moved over to a different pool after a crush rule set change, which should not be possible. Besides that it loops over and over to start remapping and backfilling (goes up to 377 pg active+clean then suddenly drops to 361, without crashes accourding to ceph -w & ceph crash ls)
>
> First about the pools:
>
> [root@CEPH-MGMT-1 ~t]# ceph df
> RAW STORAGE:
> CLASS SIZE AVAIL USED RAW USED %RAW USED
> cheaphdd 16 TiB 10 TiB 5.9 TiB 5.9 TiB 36.08
> fasthdd 33 TiB 18 TiB 16 TiB 16 TiB 47.07
> TOTAL 50 TiB 28 TiB 22 TiB 22 TiB 43.44
>
> POOLS:
> POOL ID STORED OBJECTS USED %USED MAX AVAIL
> pool1 37 780 B 1.33M 780 B 0 3.4 TiB
> pool2 48 2.0 TiB 510.57k 5.9 TiB 42.64 2.6 TiB
>
> All data is now in pool2 while the RBD image is created in pool1 (since pool2 is new).
>
> The steps it took to make ceph do this is:
>
> - Add osds with a different device class (class cheaphdd)
> - Create crushruleset for cheaphdd only called cheapdisks
> - Create pool2 with new crush rule set
> - Remove device class from the previously existing devices (remove class hdd)
> - Add class fasthdd to those devices
> - Create new crushruleset fastdisks
> - Change crushruleset for pool1 to fastdisks
>
> After this the data starts moving everything from pool1 to pool2, however, the RBD image still works and the disks of pool1 are still filled with data.
>
> I've tried to reproduce this issue using virtual machines but I couldn't make it happen again.
>
> Some extra information:
> ceph osd crush tree --show-shadow ==> https://fe.ax/639aa.H34539.txt
> ceph pg ls-by-pool pool1 ==> https://fe.ax/dcacd.H44900.txt (I know the PG count is too low)
> ceph pg ls-by-pool pool2 ==> https://fe.ax/95a2c.H51533.txt
> ceph -s ==> https://fe.ax/aab41.H69711.txt
>
>
> Can someone shine a light on why the data looks like it's moved to another pool and/or explain why the data in pool2 keeps remapping/backfilling in a loop?
What version of Ceph are you running? Are the PGs active+clean
changing in any other way?
My guess is this is just the reporting getting messed up because none
of the cheaphdd disks are supposed to be reachable by pool1 now, and
so their disk usage is being assigned to pool2. In which case it will
clear up once all the data movement is done.
Can you confirm if it's getting better as PGs actually migrate?
>
> Thanks!
>
>
> Kind regards,
>
> Marco Stuurman
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com