Re: Cephfs and ERESTARTSYS on writes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jul 24, 2015 at 3:54 PM, Vedran Furač <vedran.furac@xxxxxxxxx> wrote:
> On 07/24/2015 09:54 AM, Ilya Dryomov wrote:
>>
>> I don't know - looks like nginx isn't setting SA_RESTART, so it should
>> be repeating the write()/writev() itself.  That said, if it happens
>> only on cephfs, we need to track it down.
>
> Correct, this is first time I see such an error. I've never seen
> something like this when writing on working local disks (unless it run
> out space or inodes). It's indicative that it almost always happens as
> soon as first 4MBs are written and that's the Ceph object size.
>
>> Try enabling nginx debug logging?  I seem to have a vague recollection
>> that it records return values.  That should give us something to start
>> with.
>
> Unfortunately it doesn't look very useful:
>
> 2015/07/24 12:44:39 [debug] 26722#26722: *29428 pipe length: 1758913
> 2015/07/24 12:44:39 [debug] 26722#26722: *29428 input buf #40
> 2015/07/24 12:44:39 [debug] 26722#26722: *29428 free: 000000000343E750
> 2015/07/24 12:44:39 [debug] 26722#26722: *29428 pipe write chain
> 2015/07/24 12:44:39 [debug] 26722#26722: *29428 write: 48,
> 000000000304FD60, 1758913, 54583296
> 2015/07/24 12:44:39 [debug] 26722#26722: *29428 pipe write downstream: 0
> 2015/07/24 12:44:39 [debug] 26722#26722: *29428 pipe write downstream
> flush out
> 2015/07/24 12:44:39 [debug] 26722#26722: *29428 http output filter "/139000
> 2015/07/24 12:44:39 [debug] 26722#26722: *29428 http copy filter: "/139000
> 2015/07/24 12:44:39 [debug] 26722#26722: *29428 image filter
> 2015/07/24 12:44:39 [debug] 26722#26722: *29428 xslt filter body
> 2015/07/24 12:44:39 [debug] 26722#26722: *29428 http postpone filter
> "/139000 0000000001A60450
> 2015/07/24 12:44:39 [debug] 26722#26722: *29428 write old buf t:0 f:1
> 0000000000000000, pos 0000000000000000, size: 0 file: 24338425, size: 856071
> 2015/07/24 12:44:39 [debug] 26722#26722: *29428 write old buf t:0 f:1
> 0000000000000000, pos 0000000000000000, size: 0 file: 25194496, size:
> 2101248
> 2015/07/24 12:44:39 [debug] 26722#26722: *29428 write new buf t:0 f:1
> 0000000000000000, pos 0000000000000000, size: 0 file: 27295744, size:
> 29046465
> 2015/07/24 12:44:39 [debug] 26722#26722: *29428 http write filter: l:0
> f:0 s:32003784
> 2015/07/24 12:44:39 [debug] 26722#26722: *29428 http write filter limit 0
> 2015/07/24 12:44:39 [debug] 26722#26722: *29428 http write filter
> 0000000001A61948
> 2015/07/24 12:44:39 [debug] 26722#26722: *29428 http copy filter: -2 "/139
> 2015/07/24 12:44:39 [debug] 26722#26722: *29428 pipe write downstream done
> 2015/07/24 12:44:39 [debug] 26722#26722: *29428 event timer del: 47:
> 1437742479805
> 2015/07/24 12:44:39 [debug] 26722#26722: *29428 event timer: 46, old:
> 1437742179605, new: 1437742179866
> 2015/07/24 12:44:39 [debug] 26722#26722: *29428 http file cache update
> 2015/07/24 12:44:39 [debug] 26722#26722: *29428 http file cache rename:
> "/home/ssd/proxy_temp/1/0/0000014501" to
> "/home/ceph/1d/8b/5/ed04823384c648e574f2f078ec258b1d"
> 2015/07/24 12:44:39 [debug] 26722#26722: *29428 malloc: 0000000001F97CA0:78
> 2015/07/24 12:44:39 [debug] 26722#26722: *29428 malloc:
> 0000000002AB08F0:65536
> 2015/07/24 12:44:39 [alert] 26722#26722: *29428 write()
> "/home/ceph/1d/8b/5/ed04823384c648e574f2f078ec258b1d.0000014503" failed
> (4: Interrupted system call) while reading upstream
> 2015/07/24 12:44:39 [debug] 26722#26722: *29428 http upstream exit:
> 0000000000000000
> 2015/07/24 12:44:39 [debug] 26722#26722: *29428 finalize http upstream
> request: 0
> 2015/07/24 12:44:39 [debug] 26722#26722: *29428 finalize http proxy request
> 2015/07/24 12:44:39 [debug] 26722#26722: *29428 free rr peer 1 0
> 2015/07/24 12:44:39 [debug] 26722#26722: *29428 close http upstream
> connection: 47
> 2015/07/24 12:44:39 [debug] 26722#26722: *29428 free: 0000000001F816A0,
> unused: 48
> 2015/07/24 12:44:39 [debug] 26722#26722: *29428 reusable connection: 0
> 2015/07/24 12:44:39 [debug] 26722#26722: *29428 http upstream temp fd: 48

A quick grep reveals this in src/core/ngx_file.c:

 799     len = cf->buf_size ? cf->buf_size : 65536;

 840         n = ngx_write_fd(nfd, buf, len);
 841
 842         if (n == -1) {
 843             ngx_log_error(NGX_LOG_ALERT, cf->log, ngx_errno,
 844                           ngx_write_fd_n " \"%s\" failed", to);
 845             goto failed;
 846         }

 868 failed:
 869
 870     if (nfd != NGX_INVALID_FILE) {
 871         if (ngx_close_file(nfd) == NGX_FILE_ERROR) {
 872             ngx_log_error(NGX_LOG_ALERT, cf->log, ngx_errno,
 873                           ngx_close_file_n " \"%s\" failed", to);
 874         }
 875     }
 876
 877     if (fd != NGX_INVALID_FILE) {
 878         if (ngx_close_file(fd) == NGX_FILE_ERROR) {
 879             ngx_log_error(NGX_LOG_ALERT, cf->log, ngx_errno,
 880                           ngx_close_file_n " \"%s\" failed", from);
 881         }
 882     }

which matches strace:

25382 18:11:24.674786 read(60, "*\213\331"..., 65536) = 65536 <0.000020>
25382 18:11:24.674835 write(61, "*\213\331"..., 65536) = 65536 <0.000233>
25382 18:11:24.675093 read(60, "\315\317\""..., 65536) = 65536 <0.000018>
25382 18:11:24.675129 write(61, "\315\317\""..., 65536) = ?
ERESTARTSYS (To be restarted) <0.012122>
25382 18:11:24.687290 --- SIGALRM (Alarm clock) @ 0 (0) ---
25382 18:11:24.687442 rt_sigreturn(0xe) = -1 EINTR (Interrupted system
call) <0.000006>
25382 18:11:24.687473 gettid()          = 25382 <0.000006>
25382 18:11:24.687502 write(21, "201"..., 641) = 641 <0.000014>
25382 18:11:24.687538 close(61)         = 0 <0.000009>
25382 18:11:24.687561 close(60)         = 0 <0.000006>
25382 18:11:24.687583 unlink("/home/ssd/proxy_temp/6/9/0000191396") =
0 <0.000020>
25382 18:11:24.687625 close(55)         = 0 <0.000042>

ngx_write_fd() is just a write(), which, when interrupted by SIGALRM,
fails with EINTR because SA_RESTART is not set.  We can try digging
further, but I think nginx should retry in this case.

Thanks,

                Ilya
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux