Re: Data moved pools but didn't move osds & backfilling+remapped loop

Gregory Farnum <gfarnum@xxxxxxxxxx> · Wed, 8 May 2019 13:50:26 -0700

On Wed, May 8, 2019 at 2:37 AM Marco Stuurman
<marcostuurman1994@xxxxxxxxx> wrote:
>
> Hi,
>
> I've got an issue with the data in our pool. A RBD image containing 4TB+ data has moved over to a different pool after a crush rule set change, which should not be possible. Besides that it loops over and over to start remapping and backfilling (goes up to 377 pg active+clean then suddenly drops to 361, without crashes accourding to ceph -w & ceph crash ls)
>
> First about the pools:
>
> [root@CEPH-MGMT-1 ~t]# ceph df
> RAW STORAGE:
>     CLASS        SIZE       AVAIL      USED        RAW USED     %RAW USED
>     cheaphdd     16 TiB     10 TiB     5.9 TiB      5.9 TiB         36.08
>     fasthdd      33 TiB     18 TiB      16 TiB       16 TiB         47.07
>     TOTAL        50 TiB     28 TiB      22 TiB       22 TiB         43.44
>
> POOLS:
>     POOL             ID     STORED      OBJECTS     USED         %USED     MAX AVAIL
>     pool1              37       780 B            1.33M          780 B               0           3.4 TiB
>     pool2              48     2.0 TiB           510.57k        5.9 TiB          42.64       2.6 TiB
>
> All data is now in pool2 while the RBD image is created in pool1 (since pool2 is new).
>
> The steps it took to make ceph do this is:
>
> - Add osds with a different device class (class cheaphdd)
> - Create crushruleset for cheaphdd only called cheapdisks
> - Create pool2 with new crush rule set
> - Remove device class from the previously existing devices (remove class hdd)
> - Add class fasthdd to those devices
> - Create new crushruleset fastdisks
> - Change crushruleset for pool1 to fastdisks
>
> After this the data starts moving everything from pool1 to pool2, however, the RBD image still works and the disks of pool1 are still filled with data.
>
> I've tried to reproduce this issue using virtual machines but I couldn't make it happen again.
>
> Some extra information:
> ceph osd crush tree --show-shadow ==> https://fe.ax/639aa.H34539.txt
> ceph pg ls-by-pool pool1 ==> https://fe.ax/dcacd.H44900.txt (I know the PG count is too low)
> ceph pg ls-by-pool pool2 ==> https://fe.ax/95a2c.H51533.txt
> ceph -s ==> https://fe.ax/aab41.H69711.txt
>
>
> Can someone shine a light on why the data looks like it's moved to another pool and/or explain why the data in pool2 keeps remapping/backfilling in a loop?

What version of Ceph are you running? Are the PGs active+clean
changing in any other way?

My guess is this is just the reporting getting messed up because none
of the cheaphdd disks are supposed to be reachable by pool1 now, and
so their disk usage is being assigned to pool2. In which case it will
clear up once all the data movement is done.

Can you confirm if it's getting better as PGs actually migrate?

>
> Thanks!
>
>
> Kind regards,
>
> Marco Stuurman
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com