Re: CephFS space usage

Thorne Lawler <thorne@xxxxxxxxxxx> · Thu, 21 Mar 2024 05:13:55 +1100

Alexander,

Thanks for explaining this. As I suspected, this is a high abstract 
pursuit of what caused the problem, and while I'm sure this makes sense 
for Ceph developers, it isn't going to happen in this case.

I don't care how it got this way- the tools used to create this pool 
will never be used in our environment again after I recover this disk 
space - the entire reason I need to recover the missing space is so I 
can move enough filesystems around to remove the current structure and 
the tools that made it.

I only need to get that disk space back. Any analysis I do will be 
solely directed towards achieving that.

Thanks.

On 21/03/2024 3:10 am, Alexander E. Patrakov wrote:
Hi Thorne,

The idea is quite simple. By retesting the leak with a separate pool, 
used by nobody except you, in the case if the leak exists and is 
reproducible (which is not a given), you can definitely pinpoint it 
without giving any chance to the alternate hypothesis "somebody wrote 
some data in parallel". And then, even if the leak is small but 
reproducible, one can say that multiple such events accumulated to 10 
TB of garbage in the original pool.

On Wed, Mar 20, 2024 at 7:29 PM Thorne Lawler <thorne@xxxxxxxxxxx> wrote:

    Alexander,

    I'm happy to create a new pool if it will help, but I don't
    presently see how creating a new pool will help us to identify the
    source of the 10TB discrepancy in this original cephfs pool.

    Please help me to understand what you are hoping to find...?

    On 20/03/2024 6:35 pm, Alexander E. Patrakov wrote:
    Thorne,

    That's why I asked you to create a separate pool. All writes go
    to the original pool, and it is possible to see object counts
    per-pool.

    On Wed, Mar 20, 2024 at 6:32 AM Thorne Lawler
    <thorne@xxxxxxxxxxx> wrote:

        Alexander,

        Thank you, but as I said to Igor: The 5.5TB of files on this
        filesystem are virtual machine disks. They are under
        constant, heavy write load. There is no way to turn this off.

        On 19/03/2024 9:36 pm, Alexander E. Patrakov wrote:
        Hello Thorne,

        Here is one more suggestion on how to debug this. Right now, there is
        uncertainty on whether there is really a disk space leak or if
        something simply wrote new data during the test.

        If you have at least three OSDs you can reassign, please set their
        CRUSH device class to something different than before. E.g., "test".
        Then, create a new pool that targets this device class and add it to
        CephFS. Then, create an empty directory on CephFS and assign this pool
        to it using setfattr. Finally, try reproducing the issue using only
        files in this directory. This way, you will be sure that nobody else
        is writing any data to the new pool.

        On Tue, Mar 19, 2024 at 5:40 PM Igor Fedotov<igor.fedotov@xxxxxxxx> <mailto:igor.fedotov@xxxxxxxx> wrote:
        Hi Thorn,

        given the amount of files at CephFS volume I presume you don't have
        severe write load against it. Is that correct?

        If so we can assume that the numbers you're sharing are mostly refer to
        your experiment. At peak I can see bytes_used increase = 629,461,893,120
        bytes (45978612027392  - 45349150134272). With replica factor = 3 this
        roughly matches your written data (200GB I presume?).

        More interestingly is that after file's removal we can see 419,450,880
        bytes delta (=45349569585152 - 45349150134272). I could see two options
        (apart that someone else wrote additional stuff to CephFS during the
        experiment) to explain this:

        1. File removal wasn't completed at the last probe half an hour after
        file's removal. Did you see stale object counter when making that probe?

        2. Some space is leaking. If that's the case this could be a reason for
        your issue if huge(?) files at CephFS are created/removed periodically.
        So if we're certain that the leak really occurred (and option 1. above
        isn't the case) it makes sense to run more experiments with
        writing/removing a bunch of huge files to the volume to confirm space
        leakage.

        On 3/18/2024 3:12 AM, Thorne Lawler wrote:
        Thanks Igor,

        I have tried that, and the number of objects and bytes_used took a
        long time to drop, but they seem to have dropped back to almost the
        original level:

           * Before creating the file:
               o 3885835 objects
               o 45349150134272 bytes_used
           * After creating the file:
               o 3931663 objects
               o 45924147249152 bytes_used
           * Immediately after deleting the file:
               o 3935995 objects
               o 45978612027392 bytes_used
           * Half an hour after deleting the file:
               o 3886013 objects
               o 45349569585152 bytes_used

        Unfortunately, this is all production infrastructure, so there is
        always other activity taking place.

        What tools are there to visually inspect the object map and see how it
        relates to the filesystem?

        Not sure if there is anything like that at CephFS level but you can use
        rados tool to view objects in cephfs data pool and try to build some
        mapping between them and CephFS file list. Could be a bit tricky though.
        On 15/03/2024 7:18 pm, Igor Fedotov wrote:
        ceph df detail --format json-pretty
        --

        Regards,

        Thorne Lawler - Senior System Administrator
        *DDNS* | ABN 76 088 607 265
        First registrar certified ISO 27001-2013 Data Security Standard ITGOV40172
        P +61 499 449 170

        _DDNS

        /_*Please note:* The information contained in this email message and
        any attached files may be confidential information, and may also be
        the subject of legal professional privilege. _If you are not the
        intended recipient any use, disclosure or copying of this email is
        unauthorised. _If you received this email in error, please notify
        Discount Domain Name Services Pty Ltd on 03 9815 6868 to report this
        matter and delete all copies of this transmission together with any
        attachments. /

        --
        Igor Fedotov
        Ceph Lead Developer

        Looking for help with your Ceph cluster? Contact us athttps://croit.io <http://croit.io>

        croit GmbH, Freseniusstr. 31h, 81247 Munich
        CEO: Martin Verges - VAT-ID: DE310638492
        Com. register: Amtsgericht Munich HRB 231263
        Web:https://croit.io  | YouTube:https://goo.gl/PGE1Bx
        _______________________________________________
        ceph-users mailing list --ceph-users@xxxxxxx
        To unsubscribe send an email toceph-users-leave@xxxxxxx
        -- 

        Regards,

        Thorne Lawler - Senior System Administrator
        *DDNS* | ABN 76 088 607 265
        First registrar certified ISO 27001-2013 Data Security
        Standard ITGOV40172
        P +61 499 449 170

        _DDNS

        /_*Please note:* The information contained in this email
        message and any attached files may be confidential
        information, and may also be the subject of legal
        professional privilege. _If you are not the intended
        recipient any use, disclosure or copying of this email is
        unauthorised. _If you received this email in error, please
        notify Discount Domain Name Services Pty Ltd on 03 9815 6868
        to report this matter and delete all copies of this
        transmission together with any attachments. /

    -- 
    Alexander E. Patrakov
    -- 

    Regards,

    Thorne Lawler - Senior System Administrator
    *DDNS* | ABN 76 088 607 265
    First registrar certified ISO 27001-2013 Data Security Standard
    ITGOV40172
    P +61 499 449 170

    _DDNS

    /_*Please note:* The information contained in this email message
    and any attached files may be confidential information, and may
    also be the subject of legal professional privilege. _If you are
    not the intended recipient any use, disclosure or copying of this
    email is unauthorised. _If you received this email in error,
    please notify Discount Domain Name Services Pty Ltd on 03 9815
    6868 to report this matter and delete all copies of this
    transmission together with any attachments. /

--
Alexander E. Patrakov
--

Regards,

Thorne Lawler - Senior System Administrator
*DDNS* | ABN 76 088 607 265
First registrar certified ISO 27001-2013 Data Security Standard ITGOV40172
P +61 499 449 170

_DDNS

/_*Please note:* The information contained in this email message and any 
attached files may be confidential information, and may also be the 
subject of legal professional privilege. _If you are not the intended 
recipient any use, disclosure or copying of this email is unauthorised. 
_If you received this email in error, please notify Discount Domain Name 
Services Pty Ltd on 03 9815 6868 to report this matter and delete all 
copies of this transmission together with any attachments. /
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx