Hi everyone!
Just thought I would let everyone know: The issue appears to have been
the Ceph NFS service associated with the filesystem.
I removed all the files, waited a while, disconnected all the clients,
waited a while, then deleted the NFS shares - the disk space and objects
abruptly began freeing up.
I'm sorry that I can't contribute any more useful diagnostic
information, but maybe this is the extra bit of data that crystallizes
someone's theory about the issue.
On 21/03/2024 10:33 am, Anthony D'Atri wrote:
Grep through the ls output for ‘rados bench’ leftovers, it’s easy to leave them behind.
On Mar 20, 2024, at 5:28 PM, Igor Fedotov<igor.fedotov@xxxxxxxx> wrote:
Hi Thorne,
unfortunately I'm unaware of any tools high level enough to easily map files to rados objects without deep undestanding how this works. You might want to try "rados ls" command to get the list of all the objects in the cephfs data pool. And then learn how that mapping is performed and parse your listing.
Thanks,
Igor
On 3/20/2024 1:30 AM, Thorne Lawler wrote:
Igor,
Those files are VM disk images, and they're under constant heavy use, so yes- there/is/ constant severe write load against this disk.
Apart from writing more test files into the filesystems, there must be Ceph diagnostic tools to describe what those objects are being used for, surely?
We're talking about an extra 10TB of space. How hard can it be to determine which file those objects are associated with?
On 19/03/2024 8:39 pm, Igor Fedotov wrote:
Hi Thorn,
given the amount of files at CephFS volume I presume you don't have severe write load against it. Is that correct?
If so we can assume that the numbers you're sharing are mostly refer to your experiment. At peak I can see bytes_used increase = 629,461,893,120 bytes (45978612027392 - 45349150134272). With replica factor = 3 this roughly matches your written data (200GB I presume?).
More interestingly is that after file's removal we can see 419,450,880 bytes delta (=45349569585152 - 45349150134272). I could see two options (apart that someone else wrote additional stuff to CephFS during the experiment) to explain this:
1. File removal wasn't completed at the last probe half an hour after file's removal. Did you see stale object counter when making that probe?
2. Some space is leaking. If that's the case this could be a reason for your issue if huge(?) files at CephFS are created/removed periodically. So if we're certain that the leak really occurred (and option 1. above isn't the case) it makes sense to run more experiments with writing/removing a bunch of huge files to the volume to confirm space leakage.
On 3/18/2024 3:12 AM, Thorne Lawler wrote:
Thanks Igor,
I have tried that, and the number of objects and bytes_used took a long time to drop, but they seem to have dropped back to almost the original level:
* Before creating the file:
o 3885835 objects
o 45349150134272 bytes_used
* After creating the file:
o 3931663 objects
o 45924147249152 bytes_used
* Immediately after deleting the file:
o 3935995 objects
o 45978612027392 bytes_used
* Half an hour after deleting the file:
o 3886013 objects
o 45349569585152 bytes_used
Unfortunately, this is all production infrastructure, so there is always other activity taking place.
What tools are there to visually inspect the object map and see how it relates to the filesystem?
Not sure if there is anything like that at CephFS level but you can use rados tool to view objects in cephfs data pool and try to build some mapping between them and CephFS file list. Could be a bit tricky though.
On 15/03/2024 7:18 pm, Igor Fedotov wrote:
ceph df detail --format json-pretty
--
Regards,
Thorne Lawler - Senior System Administrator
*DDNS* | ABN 76 088 607 265
First registrar certified ISO 27001-2013 Data Security Standard ITGOV40172
P +61 499 449 170
_DDNS
/_*Please note:* The information contained in this email message and any attached files may be confidential information, and may also be the subject of legal professional privilege. _If you are not the intended recipient any use, disclosure or copying of this email is unauthorised. _If you received this email in error, please notify Discount Domain Name Services Pty Ltd on 03 9815 6868 to report this matter and delete all copies of this transmission together with any attachments. /
--
Igor Fedotov
Ceph Lead Developer
Looking for help with your Ceph cluster? Contact us athttps://croit.io
croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web:https://croit.io | YouTube:https://goo.gl/PGE1Bx
--
Regards,
Thorne Lawler - Senior System Administrator
*DDNS* | ABN 76 088 607 265
First registrar certified ISO 27001-2013 Data Security Standard ITGOV40172
P +61 499 449 170
_DDNS
/_*Please note:* The information contained in this email message and any attached files may be confidential information, and may also be the subject of legal professional privilege. _If you are not the intended recipient any use, disclosure or copying of this email is unauthorised. _If you received this email in error, please notify Discount Domain Name Services Pty Ltd on 03 9815 6868 to report this matter and delete all copies of this transmission together with any attachments. /
--
Igor Fedotov
Ceph Lead Developer
Looking for help with your Ceph cluster? Contact us athttps://croit.io
croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web:https://croit.io | YouTube:https://goo.gl/PGE1Bx
_______________________________________________
ceph-users mailing list --ceph-users@xxxxxxx
To unsubscribe send an email toceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list --ceph-users@xxxxxxx
To unsubscribe send an email toceph-users-leave@xxxxxxx
--
Regards,
Thorne Lawler - Senior System Administrator
*DDNS* | ABN 76 088 607 265
First registrar certified ISO 27001-2013 Data Security Standard ITGOV40172
P +61 499 449 170
_DDNS
/_*Please note:* The information contained in this email message and any
attached files may be confidential information, and may also be the
subject of legal professional privilege. _If you are not the intended
recipient any use, disclosure or copying of this email is unauthorised.
_If you received this email in error, please notify Discount Domain Name
Services Pty Ltd on 03 9815 6868 to report this matter and delete all
copies of this transmission together with any attachments. /
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx