On Tue, Nov 18, 2014 at 10:04 PM, Craig Lewis <clewis@xxxxxxxxxxxxxxxxxx> wrote: > That would probably have helped. The XFS deadlocks would only occur when > there was relatively little free memory. Kernel 3.18 is supposed to have a > fix for that, but I haven't tried it yet. > > Looking at my actual usage, I don't even need 64k inodes. 64k inodes should > make things a bit faster when you have a large number of files in a > directory. Ceph will automatically split directories with too many files > into multiple sub-directories, so it's kinda pointless. > > I may try the experiment again, but probably not. It took several weeks to > reformat all of the OSDS. Even on a single node, it takes 4-5 days to > drain, format, and backfill. That was months ago, and I'm still dealing > with the side effects. I'm not eager to try again. > > > On Mon, Nov 17, 2014 at 2:04 PM, Andrey Korolyov <andrey@xxxxxxx> wrote: >> >> On Tue, Nov 18, 2014 at 12:54 AM, Craig Lewis <clewis@xxxxxxxxxxxxxxxxxx> >> wrote: >> > I did have a problem in my secondary cluster that sounds similar to >> > yours. >> > I was using XFS, and traced my problem back to 64 kB inodes (osd mkfs >> > options xfs = -i size=64k). This showed up with a lot of "XFS: >> > possible >> > memory allocation deadlock in kmem_alloc" in the kernel logs. I was >> > able to >> > keep things limping along by flushing the cache frequently, but I >> > eventually >> > re-formatted every OSD to get rid of the 64k inodes. >> > >> > After I finished the reformat, I had problems because of deep-scrubbing. >> > While reformatting, I disabled deep-scrubbing. Once I re-enabled it, >> > Ceph >> > wanted to deep-scrub the whole cluster, and sometimes 90% of my OSDs >> > would >> > be doing a deep-scrub. I'm manually deep-scrubbing now, trying to >> > spread >> > out the schedule a bit. Once this finishes in a few day, I should be >> > able >> > to re-enable deep-scrubbing and keep my HEALTH_OK. >> > >> > >> >> Would you mind to check suggestions by following mine hints or hints >> from mentioned URLs from there >> http://marc.info/?l=linux-mm&m=141607712831090&w=2 with 64k again? As >> for me, I am not observing lock loop after setting min_free_kbytes for >> a half of gigabyte per OSD. Even if your locks has a different nature, >> it may be worthy to try anyway. > > Thanks, I perfectly understand this. But, if you have low enough OSD/node ratio, it can be possible to check the problem at a node scale. By the way, I do not see real reasons behind using lower allocsize NOT for object storage-designed cluster. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com