Re: INFARNALIS with 64K Kernel PAGES

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sorry, I missed that you are upgrading from Hammer…I think it is probably a bug introduced in post hammer..Here is why it is happening IMO..

 

In hammer:

-------------

 

https://github.com/ceph/ceph/blob/hammer/src/os/FileJournal.cc#L158

 

In Master/Infernalis/Jewel:

---------------------------------

 

https://github.com/ceph/ceph/blob/infernalis/src/os/FileJournal.cc#L151

 

Which is hard coded 4096

 

Not sure why this is changed, Sam/Sage ?

 

Thanks & Regards

Somnath

 

From: Garg, Pankaj [mailto:Pankaj.Garg@xxxxxxxxxxxxxxxxxx]
Sent: Tuesday, March 01, 2016 9:34 PM
To: Somnath Roy; ceph-users@xxxxxxxxxxxxxx
Subject: RE: INFARNALIS with 64K Kernel PAGES

 

The OSDS were created with 64K page size, and mkfs was done with the same size.

After upgrade, I have not changed anything on the machine (except applied the ownership fix for files for user ceph:ceph)

 

From: Somnath Roy [mailto:Somnath.Roy@xxxxxxxxxxx]
Sent: Tuesday, March 01, 2016 9:32 PM
To: Garg, Pankaj; ceph-users@xxxxxxxxxxxxxx
Subject: RE: INFARNALIS with 64K Kernel PAGES

 

Did you recreated OSDs on this setup meaning did you do mkfs with 64K page size ?

 

From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Garg, Pankaj
Sent: Tuesday, March 01, 2016 9:07 PM
To: ceph-users@xxxxxxxxxxxxxx
Subject: INFARNALIS with 64K Kernel PAGES

 

Hi,

Is there a known issue with using 64K Kernel PAGE_SIZE?

I am using ARM64 systems, and I upgraded from 0.94.4 to 9.2.1 today. The system which was on 4K page size, came up OK and OSDs are all online.

Systems with 64K Page size are all seeing the OSDs crash with following stack:

 

Begin dump of recent events ---

   -54> 2016-03-01 20:52:56.489752 ffff97e38f10  5 asok(0xaaaaff6c0000) register_command perfcounters_dump hook 0xaaaaff63c030

   -53> 2016-03-01 20:52:56.489798 ffff97e38f10  5 asok(0xaaaaff6c0000) register_command 1 hook 0xaaaaff63c030

   -52> 2016-03-01 20:52:56.489809 ffff97e38f10  5 asok(0xaaaaff6c0000) register_command perf dump hook 0xaaaaff63c030

   -51> 2016-03-01 20:52:56.489819 ffff97e38f10  5 asok(0xaaaaff6c0000) register_command perfcounters_schema hook 0xaaaaff63c030

   -50> 2016-03-01 20:52:56.489829 ffff97e38f10  5 asok(0xaaaaff6c0000) register_command 2 hook 0xaaaaff63c030

   -49> 2016-03-01 20:52:56.489839 ffff97e38f10  5 asok(0xaaaaff6c0000) register_command perf schema hook 0xaaaaff63c030

   -48> 2016-03-01 20:52:56.489849 ffff97e38f10  5 asok(0xaaaaff6c0000) register_command perf reset hook 0xaaaaff63c030

   -47> 2016-03-01 20:52:56.489858 ffff97e38f10  5 asok(0xaaaaff6c0000) register_command config show hook 0xaaaaff63c030

   -46> 2016-03-01 20:52:56.489868 ffff97e38f10  5 asok(0xaaaaff6c0000) register_command config set hook 0xaaaaff63c030

   -45> 2016-03-01 20:52:56.489877 ffff97e38f10  5 asok(0xaaaaff6c0000) register_command config get hook 0xaaaaff63c030

   -44> 2016-03-01 20:52:56.489886 ffff97e38f10  5 asok(0xaaaaff6c0000) register_command config diff hook 0xaaaaff63c030

   -43> 2016-03-01 20:52:56.489896 ffff97e38f10  5 asok(0xaaaaff6c0000) register_command log flush hook 0xaaaaff63c030

   -42> 2016-03-01 20:52:56.489905 ffff97e38f10  5 asok(0xaaaaff6c0000) register_command log dump hook 0xaaaaff63c030

   -41> 2016-03-01 20:52:56.489914 ffff97e38f10  5 asok(0xaaaaff6c0000) register_command log reopen hook 0xaaaaff63c030

   -40> 2016-03-01 20:52:56.497924 ffff97e38f10  0 set uid:gid to 64045:64045

   -39> 2016-03-01 20:52:56.498074 ffff97e38f10  0 ceph version 9.2.1 (752b6a3020c3de74e07d2a8b4c5e48dab5a6b6fd), process ceph-osd, pid 17095

   -38> 2016-03-01 20:52:56.499547 ffff97e38f10  1 -- 10.18.240.124:0/0 learned my addr 10.18.240.124:0/0

   -37> 2016-03-01 20:52:56.499572 ffff97e38f10  1 accepter.accepter.bind my_inst.addr is 10.18.240.124:6802/17095 need_addr=0

   -36> 2016-03-01 20:52:56.499620 ffff97e38f10  1 -- 192.168.240.124:0/0 learned my addr 192.168.240.124:0/0

   -35> 2016-03-01 20:52:56.499638 ffff97e38f10  1 accepter.accepter.bind my_inst.addr is 192.168.240.124:6802/17095 need_addr=0

   -34> 2016-03-01 20:52:56.499673 ffff97e38f10  1 -- 192.168.240.124:0/0 learned my addr 192.168.240.124:0/0

   -33> 2016-03-01 20:52:56.499690 ffff97e38f10  1 accepter.accepter.bind my_inst.addr is 192.168.240.124:6803/17095 need_addr=0

   -32> 2016-03-01 20:52:56.499724 ffff97e38f10  1 -- 10.18.240.124:0/0 learned my addr 10.18.240.124:0/0

   -31> 2016-03-01 20:52:56.499741 ffff97e38f10  1 accepter.accepter.bind my_inst.addr is 10.18.240.124:6803/17095 need_addr=0

   -30> 2016-03-01 20:52:56.503307 ffff97e38f10  5 asok(0xaaaaff6c0000) init /var/run/ceph/ceph-osd.100.asok

   -29> 2016-03-01 20:52:56.503329 ffff97e38f10  5 asok(0xaaaaff6c0000) bind_and_listen /var/run/ceph/ceph-osd.100.asok

   -28> 2016-03-01 20:52:56.503460 ffff97e38f10  5 asok(0xaaaaff6c0000) register_command 0 hook 0xaaaaff6380c0

   -27> 2016-03-01 20:52:56.503479 ffff97e38f10  5 asok(0xaaaaff6c0000) register_command version hook 0xaaaaff6380c0

   -26> 2016-03-01 20:52:56.503490 ffff97e38f10  5 asok(0xaaaaff6c0000) register_command git_version hook 0xaaaaff6380c0

   -25> 2016-03-01 20:52:56.503500 ffff97e38f10  5 asok(0xaaaaff6c0000) register_command help hook 0xaaaaff63c1e0

   -24> 2016-03-01 20:52:56.503510 ffff97e38f10  5 asok(0xaaaaff6c0000) register_command get_command_descriptions hook 0xaaaaff63c1f0

   -23> 2016-03-01 20:52:56.503566 ffff9643f030  5 asok(0xaaaaff6c0000) entry start

   -22> 2016-03-01 20:52:56.503635 ffff97e38f10 10 monclient(hunting): build_initial_monmap

   -21> 2016-03-01 20:52:56.520227 ffff97e38f10  5 adding auth protocol: cephx

   -20> 2016-03-01 20:52:56.520244 ffff97e38f10  5 adding auth protocol: cephx

   -19> 2016-03-01 20:52:56.520427 ffff97e38f10  5 asok(0xaaaaff6c0000) register_command objecter_requests hook 0xaaaaff63c2b0

   -18> 2016-03-01 20:52:56.520538 ffff97e38f10  1 -- 10.18.240.124:6802/17095 messenger.start

   -17> 2016-03-01 20:52:56.520601 ffff97e38f10  1 -- :/0 messenger.start

   -16> 2016-03-01 20:52:56.520655 ffff97e38f10  1 -- 10.18.240.124:6803/17095 messenger.start

   -15> 2016-03-01 20:52:56.520712 ffff97e38f10  1 -- 192.168.240.124:6803/17095 messenger.start

   -14> 2016-03-01 20:52:56.520768 ffff97e38f10  1 -- 192.168.240.124:6802/17095 messenger.start

   -13> 2016-03-01 20:52:56.520824 ffff97e38f10  1 -- :/0 messenger.start

   -12> 2016-03-01 20:52:56.520973 ffff97e38f10  2 osd.100 0 mounting /var/lib/ceph/osd/ceph-100 /var/lib/ceph/osd/ceph-100/journal

   -11> 2016-03-01 20:52:56.521095 ffff97e38f10  0 filestore(/var/lib/ceph/osd/ceph-100) backend xfs (magic 0x58465342)

   -10> 2016-03-01 20:52:56.521640 ffff97e38f10  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-100) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option

    -9> 2016-03-01 20:52:56.521659 ffff97e38f10  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-100) detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option

    -8> 2016-03-01 20:52:56.521696 ffff97e38f10  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-100) detect_features: splice is supported

    -7> 2016-03-01 20:52:56.542459 ffff97e38f10  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-100) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)

    -6> 2016-03-01 20:52:56.542582 ffff97e38f10  0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-100) detect_features: extsize is supported and your kernel >= 3.5

    -5> 2016-03-01 20:52:56.602688 ffff97e38f10  0 filestore(/var/lib/ceph/osd/ceph-100) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled

    -4> 2016-03-01 20:52:56.649576 ffff97e38f10  2 journal open /var/lib/ceph/osd/ceph-100/journal fsid 20ef1e17-8c52-42fe-ae82-ddb094220f48 fs_op_seq 300803

    -3> 2016-03-01 20:52:56.649687 ffff97e38f10  1 journal _open /var/lib/ceph/osd/ceph-100/journal fd 19: 6291456000 bytes, block size 4096 bytes, directio = 1, aio = 1

    -2> 2016-03-01 20:52:56.650394 ffff97e38f10  2 journal open journal block size 65536 != current 4096

    -1> 2016-03-01 20:52:56.650412 ffff97e38f10  3 journal journal_replay open failed with (22) Invalid argument

     0> 2016-03-01 20:52:56.654595 ffff97e38f10 -1 os/FileJournal.h: In function 'virtual FileJournal::~FileJournal()' thread ffff97e38f10 time 2016-03-01 20:52:56.650433

os/FileJournal.h: 406: FAILED assert(fd == -1)

 

thanks

Pankaj

 

 

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux