Re: Bluefs WAL : bluefs _allocate failed to allocate on bdev 0

Dietmar Rieder <dietmar.rieder@xxxxxxxxxxx> · Mon, 29 Jan 2018 10:15:23 +0100

Hi,

just for the record:

A reboot of the osd node solved the issue, now the wal is fully purged
and the extra 790MB are gone.

Sorry for the noise.

  Dietmar

On 01/27/2018 11:08 AM, Dietmar Rieder wrote:
> Hi,
> 
> replying to my own message.
> 
> After I restarted the OSD it seems some of the wal partition got purged.
> However there are still ~790MB used. As far as I think, it should get
> completely emptied. At least this is what happens when I restart another
> OSD, there its associated wal gets copletely flushed.
> Is it somhow possible to reinitialize the wal for that OSD in question?
> 
> Thanks
>   Dietmar
> 
> 
> On 01/26/2018 05:11 PM, Dietmar Rieder wrote:
>> Hi all,
>>
>> I've a question regarding bluestore wal.db:
>>
>>
>> We are running a 10 OSD node + 3 MON/MDS node cluster (luminous 12.2.2).
>> Each OSD node has 22xHDD (8TB) OSDs, 2xSSD (1.6TB) OSDs and 2xNVME (800
>> GB) for bluestore wal and db.
>>
>> We have separated wal and db partitions
>> wal partitions are 1GB
>> db partitions are 64GB
>>
>> The cluster is providing cephfs from one HDD (EC 6+3) and one SSD
>> (3xrep) pool.
>> Since the cluster is "new" we have not much data ~30TB (HDD EC) and
>> ~140GB (SSD rep) stored on it yet.
>>
>> I just noticed that the wal db usage for the SSD OSDs is all more or
>> less equal ~518MB. The wal db usage for the HDD OSDs is as well quite
>> balanced at 284-306MB, however there is one OSD whose wal db usage is ~ 1GB
>>
>>
>>    "bluefs": {
>>         "gift_bytes": 0,
>>         "reclaim_bytes": 0,
>>         "db_total_bytes": 68719468544,
>>         "db_used_bytes": 1114636288,
>>         "wal_total_bytes": 1073737728,
>>         "wal_used_bytes": 1072693248,
>>         "slow_total_bytes": 320057901056,
>>         "slow_used_bytes": 0,
>>         "num_files": 16,
>>         "log_bytes": 862326784,
>>         "log_compactions": 0,
>>         "logged_bytes": 850575360,
>>         "files_written_wal": 2,
>>         "files_written_sst": 9,
>>         "bytes_written_wal": 744469265,
>>         "bytes_written_sst": 568855830
>>     },
>>
>>
>> and I got the following log entries:
>>
>> 2018-01-26 16:31:05.484284 7f65ea28a700  1 bluefs _allocate failed to
>> allocate 0x400000 on bdev 0, free 0xff000; fallback to bdev 1
>>
>> Is there any reason for this difference ~300MB vs 1GB?
>> I have in mind that 1GB of wal should be enough, and old logs should be
>> purged to free space. (can this be triggered manually?)
>>
>> Could this be related to the fact that the HDD OSD in question was
>> failing some week ago and we replaced it with with a new HDD?
>>
>> Do we have to expect problems/performace reductions, with the falling
>> back to bdev 1?
>>
>> Thanks for any clarifying comment
>>    Dietmar
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> 
> 
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

Attachment:
signature.asc

Description: OpenPGP digital signature
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com