Proces Linux został zabity, mimo że dostępna jest wystarczająca ilość pamięci

9

Badam, dlaczego dwa z naszych procesów zostały zabite przez zabójcę Linux OOM - nawet jeśli wydaje się, że w tym czasie było wystarczająco dużo pamięci RAM i dużo SWAP.

Kiedy interpretuję to jako tę odpowiedź, pierwsze żądanie pamięci poprosiło o 2 ^ 2 = 4 strony (16KB) pamięci (flaga zamówienia) i chciałem, aby pochodziło ze strefy „Normalnej”.

Jan 27 04:26:14 kernel: [639964.652706] java invoked oom-killer: gfp_mask=0x26000c0, order=2, oom_score_adj=0

A jeśli poprawnie przeanalizuję dane wyjściowe, będzie więcej niż wystarczająco miejsca:

Node 0 Normal free:178144kB min:55068kB low:68832kB high:82600kB 

Drugi raz ma tę samą prośbę kilka minut później - i wydaje się, że jest wystarczająco dużo wolnego miejsca.

Dlaczego uruchomiono wówczas zabójcę OOM? Czy źle analizuję informacje?

  • System to Ubuntu 14.04 z jądrem 4.4.0-59 x64
  • vm.overcommit_memoryJest ustawiona na „0” ( heurystyczny ), który może nie być optymalna.

Instancja pierwsza:

Jan 27 04:26:14 kernel: [639964.652706] java invoked oom-killer: gfp_mask=0x26000c0, order=2, oom_score_adj=0
Jan 27 04:26:14 kernel: [639964.652711] java cpuset=/ mems_allowed=0
Jan 27 04:26:14 kernel: [639964.652716] CPU: 5 PID: 2152 Comm: java Not tainted 4.4.0-59-generic #80~14.04.1-Ubuntu
Jan 27 04:26:14 kernel: [639964.652717] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 06/22/2012
Jan 27 04:26:14 kernel: [639964.652719]  0000000000000000 ffff88041a963b38 ffffffff813dbd6c ffff88041a963cf0
Jan 27 04:26:14 kernel: [639964.652721]  0000000000000000 ffff88041a963bc8 ffffffff811fafc6 0000000000000000
Jan 27 04:26:14 kernel: [639964.652722]  0000000000000000 0000000000000000 ffff88042a6d1b88 0000000000000015
Jan 27 04:26:14 kernel: [639964.652724] Call Trace:
Jan 27 04:26:14 kernel: [639964.652731]  [<ffffffff813dbd6c>] dump_stack+0x63/0x87
Jan 27 04:26:14 kernel: [639964.652736]  [<ffffffff811fafc6>] dump_header+0x5b/0x1d5
Jan 27 04:26:14 kernel: [639964.652741]  [<ffffffff813766f1>] ? apparmor_capable+0xd1/0x180
Jan 27 04:26:14 kernel: [639964.652746]  [<ffffffff81188b35>] oom_kill_process+0x205/0x3d0
Jan 27 04:26:14 kernel: [639964.652747]  [<ffffffff8118916b>] out_of_memory+0x40b/0x460
Jan 27 04:26:14 kernel: [639964.652749]  [<ffffffff811fba7f>] __alloc_pages_slowpath.constprop.87+0x742/0x7ad
Jan 27 04:26:14 kernel: [639964.652752]  [<ffffffff8118e167>] __alloc_pages_nodemask+0x237/0x240
Jan 27 04:26:14 kernel: [639964.652754]  [<ffffffff8118e32d>] alloc_kmem_pages_node+0x4d/0xd0
Jan 27 04:26:14 kernel: [639964.652758]  [<ffffffff8107c125>] copy_process+0x185/0x1ce0
Jan 27 04:26:14 kernel: [639964.652763]  [<ffffffff810fd0b4>] ? do_futex+0xf4/0x520
Jan 27 04:26:14 kernel: [639964.652766]  [<ffffffff810a71c9>] ? resched_curr+0xa9/0xd0
Jan 27 04:26:14 kernel: [639964.652768]  [<ffffffff8107de1a>] _do_fork+0x8a/0x310
Jan 27 04:26:14 kernel: [639964.652769]  [<ffffffff8107e149>] SyS_clone+0x19/0x20
Jan 27 04:26:14 kernel: [639964.652775]  [<ffffffff81802c76>] entry_SYSCALL_64_fastpath+0x16/0x75
Jan 27 04:26:14 kernel: [639964.652776] Mem-Info:
Jan 27 04:26:14 kernel: [639964.652780] active_anon:1596719 inactive_anon:281182 isolated_anon:0
Jan 27 04:26:14 kernel: [639964.652780]  active_file:953586 inactive_file:952370 isolated_file:0
Jan 27 04:26:14 kernel: [639964.652780]  unevictable:0 dirty:7358 writeback:0 unstable:0
Jan 27 04:26:14 kernel: [639964.652780]  slab_reclaimable:217903 slab_unreclaimable:12162
Jan 27 04:26:14 kernel: [639964.652780]  mapped:40068 shmem:34861 pagetables:8261 bounce:0
Jan 27 04:26:14 kernel: [639964.652780]  free:71705 free_pcp:0 free_cma:0
Jan 27 04:26:14 kernel: [639964.652783] Node 0 DMA free:15892kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Jan 27 04:26:14 kernel: [639964.652787] lowmem_reserve[]: 0 2951 16005 16005 16005
Jan 27 04:26:14 kernel: [639964.652789] Node 0 DMA32 free:92784kB min:12448kB low:15560kB high:18672kB active_anon:1094416kB inactive_anon:368444kB active_file:579188kB inactive_file:561504kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3129216kB managed:3048784kB mlocked:0kB dirty:1188kB writeback:0kB mapped:32604kB shmem:27372kB slab_reclaimable:336288kB slab_unreclaimable:7196kB kernel_stack:1520kB pagetables:3964kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Jan 27 04:26:14 kernel: [639964.652793] lowmem_reserve[]: 0 0 13054 13054 13054
Jan 27 04:26:14 kernel: [639964.652795] Node 0 Normal free:178144kB min:55068kB low:68832kB high:82600kB active_anon:5292460kB inactive_anon:756284kB active_file:3235156kB inactive_file:3247976kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:13631488kB managed:13367448kB mlocked:0kB dirty:28244kB writeback:0kB mapped:127668kB shmem:112072kB slab_reclaimable:535324kB slab_unreclaimable:41436kB kernel_stack:3968kB pagetables:29080kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:128 all_unreclaimable? no
Jan 27 04:26:14 kernel: [639964.652798] lowmem_reserve[]: 0 0 0 0 0
Jan 27 04:26:14 kernel: [639964.652800] Node 0 DMA: 1*4kB (U) 0*8kB 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15892kB
Jan 27 04:26:14 kernel: [639964.652807] Node 0 DMA32: 18127*4kB (UME) 2601*8kB (UME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 93316kB
Jan 27 04:26:14 kernel: [639964.652814] Node 0 Normal: 32943*4kB (UMEH) 5702*8kB (UMEH) 19*16kB (H) 13*32kB (H) 9*64kB (H) 2*128kB (H) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 178940kB
Jan 27 04:26:14 kernel: [639964.652820] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Jan 27 04:26:14 kernel: [639964.652821] 1949078 total pagecache pages
Jan 27 04:26:14 kernel: [639964.652822] 8225 pages in swap cache
Jan 27 04:26:14 kernel: [639964.652824] Swap cache stats: add 1131771, delete 1123546, find 7366438/7540102
Jan 27 04:26:14 kernel: [639964.652824] Free swap  = 4080988kB
Jan 27 04:26:14 kernel: [639964.652825] Total swap = 4194300kB
Jan 27 04:26:14 kernel: [639964.652826] 4194174 pages RAM
Jan 27 04:26:14 kernel: [639964.652826] 0 pages HighMem/MovableOnly
Jan 27 04:26:14 kernel: [639964.652827] 86139 pages reserved
Jan 27 04:26:14 kernel: [639964.652828] 0 pages cma reserved
Jan 27 04:26:14 kernel: [639964.652828] 0 pages hwpoisoned
Jan 27 04:26:14 kernel: [639964.652829] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
Jan 27 04:26:14 kernel: [639964.652834] [  424]     0   424     4909      388      14       3       68             0 upstart-udev-br
Jan 27 04:26:14 kernel: [639964.652836] [  439]     0   439    13075      456      29       3      322         -1000 systemd-udevd
Jan 27 04:26:14 kernel: [639964.652839] [  724]     0   724     3816      226      13       3       53             0 upstart-socket-
Jan 27 04:26:14 kernel: [639964.652840] [  813]     0   813     5856      449      16       3       57             0 rpcbind
Jan 27 04:26:14 kernel: [639964.652842] [  865]   108   865     5386      456      16       3      113             0 rpc.statd
Jan 27 04:26:14 kernel: [639964.652844] [ 1034]     0  1034     3820      281      12       3       35             0 upstart-file-br
Jan 27 04:26:14 kernel: [639964.652846] [ 1041]   102  1041     9817      366      23       3       50             0 dbus-daemon
Jan 27 04:26:14 kernel: [639964.652847] [ 1045]   101  1045    65018     1203      31       3      384             0 rsyslogd
Jan 27 04:26:14 kernel: [639964.652849] [ 1056]     0  1056    10870      525      26       4       49             0 systemd-logind
Jan 27 04:26:14 kernel: [639964.652851] [ 1063]     0  1063     5870        0      16       3       53             0 rpc.idmapd
Jan 27 04:26:14 kernel: [639964.652852] [ 1153]     0  1153     2558      371       9       3      517             0 dhclient
Jan 27 04:26:14 kernel: [639964.652854] [ 1374]     0  1374     3955      401      13       3       40             0 getty
Jan 27 04:26:14 kernel: [639964.652855] [ 1377]     0  1377     3955      406      13       3       38             0 getty
Jan 27 04:26:14 kernel: [639964.652857] [ 1383]     0  1383     3955      406      13       3       39             0 getty
Jan 27 04:26:14 kernel: [639964.652858] [ 1384]     0  1384     3955      418      13       3       37             0 getty
Jan 27 04:26:14 kernel: [639964.652859] [ 1386]     0  1386     3955      418      12       3       38             0 getty
Jan 27 04:26:14 kernel: [639964.652861] [ 1403]     0  1403    15346      735      34       3      142         -1000 sshd
Jan 27 04:26:14 kernel: [639964.652863] [ 1436]     0  1436     4825      408      13       3       28             0 irqbalance
Jan 27 04:26:14 kernel: [639964.652864] [ 1440]     0  1440     1093      379       8       3       35             0 acpid
Jan 27 04:26:14 kernel: [639964.652866] [ 1442]     0  1442     4785      176      14       3       38             0 atd
Jan 27 04:26:14 kernel: [639964.652867] [ 1443]     0  1443     5914      466      17       3       43             0 cron
Jan 27 04:26:14 kernel: [639964.652869] [ 1464]   105  1464    61957     3600      59       3      273          -900 postgres
Jan 27 04:26:14 kernel: [639964.652870] [ 1561]   107  1561     7864      657      21       3      113             0 ntpd
Jan 27 04:26:14 kernel: [639964.652872] [ 1762]   105  1762    62036    35419     117       3      264             0 postgres
Jan 27 04:26:14 kernel: [639964.652873] [ 1763]   105  1763    61957    35051     117       3      266             0 postgres
Jan 27 04:26:14 kernel: [639964.652875] [ 1764]   105  1764    61957     1773      52       3      306             0 postgres
Jan 27 04:26:14 kernel: [639964.652877] [ 1765]   105  1765    62166     4999     116       3      374             0 postgres
Jan 27 04:26:14 kernel: [639964.652878] [ 1766]   105  1766    25910      617      48       3      274             0 postgres
Jan 27 04:26:14 kernel: [639964.652880] [ 1834]  1004  1834  2002886   692615    1549      10    12707             0 java
Jan 27 04:26:14 kernel: [639964.652881] [ 1921]   106  1921     5835      452      16       3      112             0 nrpe
Jan 27 04:26:14 kernel: [639964.652883] [ 1943]     0  1943   175986      420      41       4       50             0 nscd
Jan 27 04:26:14 kernel: [639964.652884] [ 1978]   109  1978   111112      309      48       4      213             0 nslcd
Jan 27 04:26:14 kernel: [639964.652886] [ 2007]     8  2007     3172      326      11       3       52             0 nullmailer-send
Jan 27 04:26:14 kernel: [639964.652887] [ 2092]     0  2092    34005     1947      70       3     3067             0 /usr/bin/monito
Jan 27 04:26:14 kernel: [639964.652889] [ 2110]     0  2110     1901      367       9       3       25             0 getty
Jan 27 04:26:14 kernel: [639964.652891] [ 2146] 65534  2146    34005     1101      67       3     3810             0 monitorix-httpd
Jan 27 04:26:14 kernel: [639964.652893] [24525]   105 24525  1826264  1151331    3568      10      299             0 postgres
Jan 27 04:26:14 kernel: [639964.652895] [20380]   105 20380    62511    36514     120       3      237             0 postgres
Jan 27 04:26:14 kernel: [639964.652897] [21273]   105 21273    62532    36508     120       3      237             0 postgres
Jan 27 04:26:14 kernel: [639964.652898] [22133]   105 22133    62610    36827     120       3      237             0 postgres
Jan 27 04:26:14 kernel: [639964.652900] [22135]   105 22135    62541    34994     120       3      237             0 postgres
Jan 27 04:26:14 kernel: [639964.652901] [22428]     0 22428    15436      739      35       4       11             0 cron
Jan 27 04:26:14 kernel: [639964.652903] [22429]     0 22429    15489      749      35       4       12             0 cron
Jan 27 04:26:14 kernel: [639964.652904] [22442]     0 22442     1112      198       8       3        0             0 sh
Jan 27 04:26:14 kernel: [639964.652906] [22443]  1004 22443     1112      191       8       3        0             0 sh
Jan 27 04:26:14 kernel: [639964.652908] [22444]  1004 22444     3102      748      11       3        0             0 syncDaily.sh
Jan 27 04:26:14 kernel: [639964.652909] [22445]     0 22445     1112      420       8       3        0             0 cron-apt
Jan 27 04:26:14 kernel: [639964.652911] [22465]  1004 22465    55074    10532     113       3        0             0 rsync
Jan 27 04:26:14 kernel: [639964.652912] [22466]     0 22466     1087      171       8       3        0             0 sleep
Jan 27 04:26:14 kernel: [639964.652914] [22467]  1004 22467    29820     9151      62       3        0             0 rsync
Jan 27 04:26:14 kernel: [639964.652915] [22468]  1004 22468    61770     7168     125       3        0             0 rsync
Jan 27 04:26:14 kernel: [639964.652917] [22990]   105 22990    62490    35099     120       3      237             0 postgres
Jan 27 04:26:14 kernel: [639964.652919] [23138]   105 23138    62491    35578     120       3      237             0 postgres
Jan 27 04:26:14 kernel: [639964.652920] [23139]   105 23139    62690    36657     121       3      236             0 postgres
Jan 27 04:26:14 kernel: [639964.652922] [23140]   105 23140    62455    32973     120       3      237             0 postgres
Jan 27 04:26:14 kernel: [639964.652923] [23631]   105 23631    62518    34978     120       3      237             0 postgres
Jan 27 04:26:14 kernel: [639964.652925] [23635]   105 23635    62506    35193     120       3      237             0 postgres
Jan 27 04:26:14 kernel: [639964.652927] [23636]   105 23636    62455    30085     120       3      237             0 postgres
Jan 27 04:26:14 kernel: [639964.652928] [23637]   105 23637    62470    33106     120       3      237             0 postgres
Jan 27 04:26:14 kernel: [639964.652930] [23639]   105 23639    62511    34295     120       3      237             0 postgres
Jan 27 04:26:14 kernel: [639964.652940] Out of memory: Kill process 24525 (postgres) score 224 or sacrifice child
Jan 27 04:26:14 kernel: [639964.652975] Killed process 24525 (postgres) total-vm:7305056kB, anon-rss:4460244kB, file-rss:145080kB

Instancja druga

Jan 27 04:34:36 kernel: [640466.131656] java invoked oom-killer: gfp_mask=0x26000c0, order=2, oom_score_adj=0
Jan 27 04:34:36 kernel: [640466.131660] java cpuset=/ mems_allowed=0
Jan 27 04:34:36 kernel: [640466.131665] CPU: 7 PID: 2152 Comm: java Not tainted 4.4.0-59-generic #80~14.04.1-Ubuntu
Jan 27 04:34:36 kernel: [640466.131666] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 06/22/2012
Jan 27 04:34:36 kernel: [640466.131668]  0000000000000000 ffff88041a963b38 ffffffff813dbd6c ffff88041a963cf0
Jan 27 04:34:36 kernel: [640466.131670]  0000000000000000 ffff88041a963bc8 ffffffff811fafc6 0000000000000000
Jan 27 04:34:36 kernel: [640466.131671]  0000000000000000 0000000000000000 ffff88042a6d1b88 0000000000000015
Jan 27 04:34:36 kernel: [640466.131673] Call Trace:
Jan 27 04:34:36 kernel: [640466.131698]  [<ffffffff813dbd6c>] dump_stack+0x63/0x87
Jan 27 04:34:36 kernel: [640466.131712]  [<ffffffff811fafc6>] dump_header+0x5b/0x1d5
Jan 27 04:34:36 kernel: [640466.131721]  [<ffffffff813766f1>] ? apparmor_capable+0xd1/0x180
Jan 27 04:34:36 kernel: [640466.131728]  [<ffffffff81188b35>] oom_kill_process+0x205/0x3d0
Jan 27 04:34:36 kernel: [640466.131730]  [<ffffffff8118916b>] out_of_memory+0x40b/0x460
Jan 27 04:34:36 kernel: [640466.131732]  [<ffffffff811fba7f>] __alloc_pages_slowpath.constprop.87+0x742/0x7ad
Jan 27 04:34:36 kernel: [640466.131734]  [<ffffffff8118e167>] __alloc_pages_nodemask+0x237/0x240
Jan 27 04:34:36 kernel: [640466.131736]  [<ffffffff8118e32d>] alloc_kmem_pages_node+0x4d/0xd0
Jan 27 04:34:36 kernel: [640466.131745]  [<ffffffff8107c125>] copy_process+0x185/0x1ce0
Jan 27 04:34:36 kernel: [640466.131755]  [<ffffffff810fd0b4>] ? do_futex+0xf4/0x520
Jan 27 04:34:36 kernel: [640466.131761]  [<ffffffff810a71c9>] ? resched_curr+0xa9/0xd0
Jan 27 04:34:36 kernel: [640466.131763]  [<ffffffff8107de1a>] _do_fork+0x8a/0x310
Jan 27 04:34:36 kernel: [640466.131765]  [<ffffffff8107e149>] SyS_clone+0x19/0x20
Jan 27 04:34:36 kernel: [640466.131779]  [<ffffffff81802c76>] entry_SYSCALL_64_fastpath+0x16/0x75
Jan 27 04:34:36 kernel: [640466.131781] Mem-Info:
Jan 27 04:34:36 kernel: [640466.131784] active_anon:463046 inactive_anon:339934 isolated_anon:0
Jan 27 04:34:36 kernel: [640466.131784]  active_file:1074992 inactive_file:1398029 isolated_file:0
Jan 27 04:34:36 kernel: [640466.131784]  unevictable:0 dirty:1307 writeback:0 unstable:0
Jan 27 04:34:36 kernel: [640466.131784]  slab_reclaimable:626085 slab_unreclaimable:26239
Jan 27 04:34:36 kernel: [640466.131784]  mapped:40618 shmem:35429 pagetables:4038 bounce:0
Jan 27 04:34:36 kernel: [640466.131784]  free:161367 free_pcp:0 free_cma:0
Jan 27 04:34:36 kernel: [640466.131788] Node 0 DMA free:15892kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Jan 27 04:34:36 kernel: [640466.131792] lowmem_reserve[]: 0 2951 16005 16005 16005
Jan 27 04:34:36 kernel: [640466.131794] Node 0 DMA32 free:112056kB min:12448kB low:15560kB high:18672kB active_anon:465908kB inactive_anon:478436kB active_file:620808kB inactive_file:963844kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3129216kB managed:3048784kB mlocked:0kB dirty:844kB writeback:0kB mapped:11132kB shmem:5644kB slab_reclaimable:390764kB slab_unreclaimable:8488kB kernel_stack:1408kB pagetables:2304kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Jan 27 04:34:36 kernel: [640466.131798] lowmem_reserve[]: 0 0 13054 13054 13054
Jan 27 04:34:36 kernel: [640466.131800] Node 0 Normal free:517520kB min:55068kB low:68832kB high:82600kB active_anon:1386276kB inactive_anon:881300kB active_file:3679160kB inactive_file:4628272kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:13631488kB managed:13367448kB mlocked:0kB dirty:4384kB writeback:0kB mapped:151340kB shmem:136072kB slab_reclaimable:2113576kB slab_unreclaimable:96452kB kernel_stack:3904kB pagetables:13848kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Jan 27 04:34:36 kernel: [640466.131803] lowmem_reserve[]: 0 0 0 0 0
Jan 27 04:34:36 kernel: [640466.131805] Node 0 DMA: 1*4kB (U) 0*8kB 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15892kB
Jan 27 04:34:36 kernel: [640466.131812] Node 0 DMA32: 20157*4kB (UME) 4165*8kB (UME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 113948kB
Jan 27 04:34:36 kernel: [640466.131817] Node 0 Normal: 119665*4kB (UMEH) 4706*8kB (UMEH) 12*16kB (H) 13*32kB (H) 10*64kB (H) 2*128kB (H) 1*256kB (H) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 518068kB
Jan 27 04:34:36 kernel: [640466.131824] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Jan 27 04:34:36 kernel: [640466.131825] 2516698 total pagecache pages
Jan 27 04:34:36 kernel: [640466.131826] 8199 pages in swap cache
Jan 27 04:34:36 kernel: [640466.131828] Swap cache stats: add 1131970, delete 1123771, find 7374629/7548428
Jan 27 04:34:36 kernel: [640466.131828] Free swap  = 4085700kB
Jan 27 04:34:36 kernel: [640466.131829] Total swap = 4194300kB
Jan 27 04:34:36 kernel: [640466.131830] 4194174 pages RAM
Jan 27 04:34:36 kernel: [640466.131830] 0 pages HighMem/MovableOnly
Jan 27 04:34:36 kernel: [640466.131831] 86139 pages reserved
Jan 27 04:34:36 kernel: [640466.131832] 0 pages cma reserved
Jan 27 04:34:36 kernel: [640466.131832] 0 pages hwpoisoned
Jan 27 04:34:36 kernel: [640466.131833] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
Jan 27 04:34:36 kernel: [640466.131838] [  424]     0   424     4909      388      14       3       68             0 upstart-udev-br
Jan 27 04:34:36 kernel: [640466.131841] [  439]     0   439    13075      456      29       3      322         -1000 systemd-udevd
Jan 27 04:34:36 kernel: [640466.131843] [  724]     0   724     3816      226      13       3       53             0 upstart-socket-
Jan 27 04:34:36 kernel: [640466.131845] [  813]     0   813     5856      449      16       3       57             0 rpcbind
Jan 27 04:34:36 kernel: [640466.131846] [  865]   108   865     5386      456      16       3      113             0 rpc.statd
Jan 27 04:34:36 kernel: [640466.131848] [ 1034]     0  1034     3820      281      12       3       35             0 upstart-file-br
Jan 27 04:34:36 kernel: [640466.131850] [ 1041]   102  1041     9817      366      23       3       50             0 dbus-daemon
Jan 27 04:34:36 kernel: [640466.131852] [ 1045]   101  1045    65018     1255      31       3      362             0 rsyslogd
Jan 27 04:34:36 kernel: [640466.131854] [ 1056]     0  1056    10870      525      26       4       49             0 systemd-logind
Jan 27 04:34:36 kernel: [640466.131855] [ 1063]     0  1063     5870        0      16       3       53             0 rpc.idmapd
Jan 27 04:34:36 kernel: [640466.131857] [ 1153]     0  1153     2558      371       9       3      517             0 dhclient
Jan 27 04:34:36 kernel: [640466.131858] [ 1374]     0  1374     3955      401      13       3       40             0 getty
Jan 27 04:34:36 kernel: [640466.131860] [ 1377]     0  1377     3955      406      13       3       38             0 getty
Jan 27 04:34:36 kernel: [640466.131861] [ 1383]     0  1383     3955      406      13       3       39             0 getty
Jan 27 04:34:36 kernel: [640466.131863] [ 1384]     0  1384     3955      418      13       3       37             0 getty
Jan 27 04:34:36 kernel: [640466.131864] [ 1386]     0  1386     3955      418      12       3       38             0 getty
Jan 27 04:34:36 kernel: [640466.131866] [ 1403]     0  1403    15346      735      34       3      142         -1000 sshd
Jan 27 04:34:36 kernel: [640466.131868] [ 1436]     0  1436     4825      408      13       3       28             0 irqbalance
Jan 27 04:34:36 kernel: [640466.131869] [ 1440]     0  1440     1093      379       8       3       35             0 acpid
Jan 27 04:34:36 kernel: [640466.131871] [ 1442]     0  1442     4785      176      14       3       38             0 atd
Jan 27 04:34:36 kernel: [640466.131872] [ 1443]     0  1443     5914      466      17       3       43             0 cron
Jan 27 04:34:36 kernel: [640466.131874] [ 1464]   105  1464    61957     4409      59       3      254          -900 postgres
Jan 27 04:34:36 kernel: [640466.131876] [ 1561]   107  1561     7864      657      21       3      113             0 ntpd
Jan 27 04:34:36 kernel: [640466.131877] [ 1834]  1004  1834  2002886   692883    1549      10    12598             0 java
Jan 27 04:34:36 kernel: [640466.131879] [ 1921]   106  1921     5835      452      16       3      112             0 nrpe
Jan 27 04:34:36 kernel: [640466.131880] [ 1943]     0  1943   175986      420      41       4       50             0 nscd
Jan 27 04:34:36 kernel: [640466.131882] [ 1978]   109  1978   111112      309      48       4      213             0 nslcd
Jan 27 04:34:36 kernel: [640466.131883] [ 2007]     8  2007     3172      326      11       3       52             0 nullmailer-send
Jan 27 04:34:36 kernel: [640466.131885] [ 2092]     0  2092    34005     1947      70       3     3067             0 /usr/bin/monito
Jan 27 04:34:36 kernel: [640466.131887] [ 2110]     0  2110     1901      367       9       3       25             0 getty
Jan 27 04:34:36 kernel: [640466.131888] [ 2146] 65534  2146    34005     1101      67       3     3810             0 monitorix-httpd
Jan 27 04:34:36 kernel: [640466.131891] [22428]     0 22428    15436      739      35       4       11             0 cron
Jan 27 04:34:36 kernel: [640466.131892] [22429]     0 22429    15489      749      35       4       12             0 cron
Jan 27 04:34:36 kernel: [640466.131894] [22442]     0 22442     1112      198       8       3        0             0 sh
Jan 27 04:34:36 kernel: [640466.131895] [22443]  1004 22443     1112      191       8       3        0             0 sh
Jan 27 04:34:36 kernel: [640466.131897] [22444]  1004 22444     3102      748      11       3        0             0 syncDaily.sh
Jan 27 04:34:36 kernel: [640466.131899] [22445]     0 22445     1112      420       8       3        0             0 cron-apt
Jan 27 04:34:36 kernel: [640466.131900] [22465]  1004 22465    54754    27012     113       3        0             0 rsync
Jan 27 04:34:36 kernel: [640466.131902] [22466]     0 22466     1087      171       8       3        0             0 sleep
Jan 27 04:34:36 kernel: [640466.131903] [22467]  1004 22467    34234    26953      72       3        0             0 rsync
Jan 27 04:34:36 kernel: [640466.131905] [22468]  1004 22468    62154    15613     126       3        0             0 rsync
Jan 27 04:34:36 kernel: [640466.131907] [24170]   105 24170    61990    34251     117       3      237             0 postgres
Jan 27 04:34:36 kernel: [640466.131908] [24171]   105 24171    61957    33191     115       3      238             0 postgres
Jan 27 04:34:36 kernel: [640466.131910] [24172]   105 24172    61957     1190      52       3      238             0 postgres
Jan 27 04:34:36 kernel: [640466.131911] [24173]   105 24173    62166     1333      54       3      239             0 postgres
Jan 27 04:34:36 kernel: [640466.131913] [24174]   105 24174    25876      642      48       3      242             0 postgres
Jan 27 04:34:36 kernel: [640466.131915] [24175]   105 24175    62464    35199     120       3      232             0 postgres
Jan 27 04:34:36 kernel: [640466.131916] [24203]   105 24203    62467    22296     120       3      232             0 postgres
Jan 27 04:34:36 kernel: [640466.131918] [24266]   105 24266    62475    36452     120       3      232             0 postgres
Jan 27 04:34:36 kernel: [640466.131920] [24317]   105 24317    62424    17702     119       3      232             0 postgres
Jan 27 04:34:36 kernel: [640466.131921] [24318]   105 24318    62449    24858     120       3      232             0 postgres
Jan 27 04:34:36 kernel: [640466.131923] [24320]   105 24320    62485    24779     120       3      232             0 postgres
Jan 27 04:34:36 kernel: [640466.131925] [24321]   105 24321    62449    27595     120       3      232             0 postgres
Jan 27 04:34:36 kernel: [640466.131926] [24452]   105 24452    62484    16118     120       3      232             0 postgres
Jan 27 04:34:36 kernel: [640466.131928] Out of memory: Kill process 1834 (java) score 137 or sacrifice child
Jan 27 04:34:36 kernel: [640466.132070] Killed process 1834 (java) total-vm:8011544kB, anon-rss:2763340kB, file-rss:8192kB
Patrick
źródło
Zobacz doskonałą odpowiedź Matthew Ife na dupe serverfault.com/a/762327/9517
user9517
@istheEnglishway Czy to naprawdę dupe? Nasze zapytanie dotyczy „strefy normalnej” (gfp kończy się na zero), a dane wyjściowe zawierają taką strefę. Podczas gdy w dupe, do którego się odwołujesz, oba odnoszą się do innej strefy pamięci (HighMem), która nie jest obecna w danych wyjściowych ...
Patrick
1
Warto zauważyć, że nawet jeśli jest wystarczająca ilość wolnej pamięci, nie ma gwarancji, że będzie 16KB ciągłych. Nie jestem pewien, czy dotyczy to tutaj, czy nie.
kasperd
Prawidłowo, ale jeśli poprawnie przeanalizuję dane wyjściowe, powinno być dostępnych kilka bloków 16 KB: „19 * 16 kB (H)” dla pierwszego wystąpienia. Mogłem znaleźć odpowiedni błąd jądra, opublikuję go jako (możliwą) odpowiedź.
Patrick

Odpowiedzi:

5

Jednym z możliwych powodów wydaje się być błąd jądra Ubuntu. Występuje problem ze starterem, który opisuje sytuacje OOM dla tego jądra: 1655842

Po powrocie do starszego jądra (4.4.0-57) problem nie pojawił się ponownie w ciągu dwóch tygodni, kiedy monitorowaliśmy system.

Alternatywne wyjaśnienia / odpowiedzi są nadal bardzo mile widziane!

Patrick
źródło
Wszelkie pomysły, czy najnowsza wersja 4.4.0naprawia ten błąd?
Elad Nava,
1
O ile mi wiadomo, tak. Od tego czasu zaktualizowaliśmy do nowszych jąder 4.4.0 i ten problem już nie występuje.
Patrick