2026/4/1 18:34:36
网站建设
项目流程
建设银行网站不能打开,做影视网站被告怎么办,闸北区网站建设,免费的外贸发布平台一次由隐藏大页配置引发的数据库OOM故障分析
一、事故发生
在周日清晨#xff0c;收到紧急短信告警#xff0c;数据库实例发生异常重启。首先登录数据库服务器#xff0c;查看日志记录
2025-12-21T06:54:57.25915608:00 77 [Note] [MY-010914] [Server] Aborted connection …一次由隐藏大页配置引发的数据库OOM故障分析一、事故发生在周日清晨收到紧急短信告警数据库实例发生异常重启。首先登录数据库服务器查看日志记录2025-12-21T06:54:57.25915608:00 77 [Note] [MY-010914] [Server] Aborted connection 77 to db: unconnected user: root host: 172.17.139.203 (Got an error reading communication packets). 2025-12-21T06:55:33.224314Z mysqld_safe Number of processes running now: 0 2025-12-21T06:55:33.248143Z mysqld_safe mysqld restarted 2025-12-21T06:55:34.05346208:00 0 [Warning] [MY-011069] [Server] The syntax --replica-parallel-type is deprecated and will be removed in a future release. 2025-12-21T06:55:34.05356908:00 0 [Warning] [MY-011068] [Server] The syntax --ssloff is deprecated and will be removed in a future release. Please use --tls-version instead.通过该日志内容初步判断重启原因是发生了OOM异常直接观察系统日志/var/log/messages确认存在oom异常信息。[rootgdb-adm ~]# grep -inr /var/log/messages 5:Dec 21 06:55:33 gdb kernel: [419827.630493] crontab-1 invoked oom-killer: gfp_mask0x6200ca(GFP_HIGHUSER_MOVABLE), order0, oom_score_adj0 11:Dec 21 06:55:33 gdb kernel: [419827.630530] oom_kill_process0x24f/0x270 12:Dec 21 06:55:33 gdb kernel: [419827.630532] ? oom_badness0x25/0x140 68:Dec 21 06:55:33 gdb kernel: [419827.630752] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name 148:Dec 21 06:55:33 gdb kernel: [419827.631062] oom-kill:constraintCONSTRAINT_NONE,nodemask(null),cpuset/,mems_allowed0-1,global_oom,task_memcg/user.slice/user-2036.slice/session-6188.scope,taskmysqld,pid2567710,uid2032二、问题分析1、内存设置检查服务器物理内存376G而innodb_buffer_pool_size设置为200G占比为53%符合预期。free -h total used free shared buff/cache available Mem: 376Gi 267Gi 26Gi 5.0Mi 82Gi 53Gi2、jemolloc判断作为GreatSQL数据库或者开源MySQL数据库出现OOM的情况很大可能是由于使用默认的glibc内存分配管理内存使用后释放不完全引起内存泄漏导致通过命令lsof -p PID| grep jem观察内存分配管理方式[rootgdb ~]# lsof -p 25424 | grep jem mysqld 25424 mysql mem REG 8,2 2136088 2355262 /data/svr/greatsql/lib/mysql/libjemalloc.so.1从返回可以看出配置正常基本上可以排除此原因。3、OOM日志详细分析1完整OOM日志Dec 21 06:55:33 gdb kernel: [419827.630493] crontab-1 invoked oom-killer: gfp_mask0x6200ca(GFP_HIGHUSER_MOVABLE), order0, oom_score_adj0 Dec 21 06:55:33 gdb kernel: [419827.630499] CPU: 14 PID: 9458 Comm: crontab-1 Kdump: loaded Not tainted 4.19.90-2107.6.0.0227.28.oe1.bclinux.x86_64 #1 Dec 21 06:55:33 gdb kernel: [419827.630500] Hardware name: FiberHome FitServer/FiberHome Boards, BIOS 3.4.V7 02/01/2023 Dec 21 06:55:33 gdb kernel: [419827.630507] Call Trace: Dec 21 06:55:33 gdb kernel: [419827.630519] dump_stack0x66/0x8b Dec 21 06:55:33 gdb kernel: [419827.630527] dump_header0x4a/0x1fc Dec 21 06:55:33 gdb kernel: [419827.630530] oom_kill_process0x24f/0x270 Dec 21 06:55:33 gdb kernel: [419827.630532] ? oom_badness0x25/0x140 Dec 21 06:55:33 gdb kernel: [419827.630533] out_of_memory0x11f/0x540 Dec 21 06:55:33 gdb kernel: [419827.630536] __alloc_pages_slowpath0x9f5/0xde0 Dec 21 06:55:33 gdb kernel: [419827.630543] __alloc_pages_nodemask0x2a8/0x2d0 Dec 21 06:55:33 gdb kernel: [419827.630549] filemap_fault0x35e/0x8a0 Dec 21 06:55:33 gdb kernel: [419827.630555] ? alloc_set_pte0x244/0x450 Dec 21 06:55:33 gdb kernel: [419827.630558] ? filemap_map_pages0x28f/0x480 Dec 21 06:55:33 gdb kernel: [419827.630584] ext4_filemap_fault0x2c/0x40 [ext4] Dec 21 06:55:33 gdb kernel: [419827.630588] __do_fault0x33/0x110 Dec 21 06:55:33 gdb kernel: [419827.630592] do_fault0x12e/0x490 Dec 21 06:55:33 gdb kernel: [419827.630595] ? __handle_mm_fault0x2a/0x690 Dec 21 06:55:33 gdb kernel: [419827.630597] __handle_mm_fault0x613/0x690 Dec 21 06:55:33 gdb kernel: [419827.630601] handle_mm_fault0xc4/0x200 Dec 21 06:55:33 gdb kernel: [419827.630604] __do_page_fault0x2ba/0x4d0 Dec 21 06:55:33 gdb kernel: [419827.630609] ? __audit_syscall_exit0x238/0x2c0 Dec 21 06:55:33 gdb kernel: [419827.630611] do_page_fault0x31/0x130 Dec 21 06:55:33 gdb kernel: [419827.630616] ? page_fault0x8/0x30 Dec 21 06:55:33 gdb kernel: [419827.630620] page_fault0x1e/0x30 Dec 21 06:55:33 gdb kernel: [419827.630623] Mem-Info: Dec 21 06:55:33 gdb kernel: [419827.630635] active_anon:50985791 inactive_anon:354 isolated_anon:0#012 active_file:677 inactive_file:0 isolated_file:0#012 unevictable:0 dirty:105 writeback:123 unstable:0#012 slab_reclaimable:20583 slab_unreclaimable:49628#012 m apped:319 shmem:1323 pagetables:106803 bounce:0#012 free:5313776 free_pcp:5715 free_cma:0 Dec 21 06:55:33 gdb kernel: [419827.630638] Node 0 active_anon:100766572kB inactive_anon:556kB active_file:1384kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:76kB dirty:32kB writeback:0kB shmem:2276kB shmem_thp: 0kB shmem_pmdm apped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no Dec 21 06:55:33 gdb kernel: [419827.630645] Node 1 active_anon:103176592kB inactive_anon:860kB active_file:1324kB inactive_file:80kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:1200kB dirty:388kB writeback:492kB shmem:3016kB shmem_thp: 0kB shme m_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no Dec 21 06:55:33 gdb kernel: [419827.630650] Node 0 DMA free:15892kB min:824kB low:1028kB high:1232kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15976kB managed:15892kB mlocked:0kB kernel_stack:0k B pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB Dec 21 06:55:33 gdb kernel: [419827.630654] lowmem_reserve[]: 0 1347 191666 191666 191666 Dec 21 06:55:33 gdb kernel: [419827.630661] Node 0 DMA32 free:833940kB min:72972kB low:91212kB high:109452kB active_anon:559420kB inactive_anon:8kB active_file:68kB inactive_file:0kB unevictable:0kB writepending:32kB present:1733384kB managed:1405672kB mlocked: 0kB kernel_stack:52kB pagetables:1084kB bounce:0kB free_pcp:400kB local_pcp:0kB free_cma:0kB Dec 21 06:55:33 gdb kernel: [419827.630666] lowmem_reserve[]: 0 0 190319 190319 190319 Dec 21 06:55:33 gdb kernel: [419827.630672] Node 0 Normal free:10117540kB min:10117912kB low:12647388kB high:15176864kB active_anon:100207152kB inactive_anon:548kB active_file:808kB inactive_file:0kB unevictable:0kB writepending:0kB present:198180864kB managed: 194894048kB mlocked:0kB kernel_stack:13504kB pagetables:215840kB bounce:0kB free_pcp:536kB local_pcp:0kB free_cma:0kB Dec 21 06:55:33 gdb kernel: [419827.630679] lowmem_reserve[]: 0 0 0 0 0 Dec 21 06:55:33 gdb kernel: [419827.630683] Node 1 Normal free:10287732kB min:10288284kB low:12860352kB high:15432420kB active_anon:103176592kB inactive_anon:860kB active_file:1324kB inactive_file:80kB unevictable:0kB writepending:880kB present:201326592kB mana ged:198175752kB mlocked:0kB kernel_stack:11836kB pagetables:210288kB bounce:0kB free_pcp:21924kB local_pcp:332kB free_cma:0kB Dec 21 06:55:33 gdb kernel: [419827.630686] lowmem_reserve[]: 0 0 0 0 0 Dec 21 06:55:33 gdb kernel: [419827.630688] Node 0 DMA: 1*4kB (U) 0*8kB 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) 15892kB Dec 21 06:55:33 gdb kernel: [419827.630694] Node 0 DMA32: 240*4kB (UME) 178*8kB (UME) 140*16kB (UME) 66*32kB (UME) 70*64kB (UME) 53*128kB (UME) 38*256kB (UME) 18*512kB (UE) 3*1024kB (U) 2*2048kB (UE) 193*4096kB (M) 834640kB Dec 21 06:55:33 gdb kernel: [419827.630702] Node 0 Normal: 3557*4kB (UE) 1963*8kB (UME) 651*16kB (UME) 1139*32kB (UME) 855*64kB (UME) 572*128kB (UME) 308*256kB (UE) 129*512kB (UME) 50*1024kB (UME) 27*2048kB (UME) 2359*4096kB (UME) 10118588kB Dec 21 06:55:33 gdb kernel: [419827.630712] Node 1 Normal: 3636*4kB (UME) 1848*8kB (UME) 2744*16kB (UME) 2139*32kB (UME) 1580*64kB (UME) 1073*128kB (UME) 613*256kB (UME) 280*512kB (UE) 130*1024kB (UE) 81*2048kB (UE) 2273*4096kB (UME) 10289648kB Dec 21 06:55:33 gdb kernel: [419827.630731] Node 0 hugepages_total0 hugepages_free0 hugepages_surp0 hugepages_size1048576kB Dec 21 06:55:33 gdb kernel: [419827.630737] Node 0 hugepages_total40960 hugepages_free40960 hugepages_surp0 hugepages_size2048kB Dec 21 06:55:33 gdb kernel: [419827.630738] Node 1 hugepages_total0 hugepages_free0 hugepages_surp0 hugepages_size1048576kB Dec 21 06:55:33 gdb kernel: [419827.630741] Node 1 hugepages_total40960 hugepages_free40960 hugepages_surp0 hugepages_size2048kB Dec 21 06:55:33 gdb kernel: [419827.630742] 3360 total pagecache pages Dec 21 06:55:33 gdb kernel: [419827.630744] 0 pages in swap cache Dec 21 06:55:33 gdb kernel: [419827.630746] Swap cache stats: add 0, delete 0, find 0/0 Dec 21 06:55:33 gdb kernel: [419827.630746] Free swap 0kB Dec 21 06:55:33 gdb kernel: [419827.630747] Total swap 0kB Dec 21 06:55:33 gdb kernel: [419827.630748] 100314204 pages RAM Dec 21 06:55:33 gdb kernel: [419827.630749] 0 pages HighMem/MovableOnly Dec 21 06:55:33 gdb kernel: [419827.630749] 1691363 pages reserved Dec 21 06:55:33 gdb kernel: [419827.630750] 0 pages hwpoisoned Dec 21 06:55:33 gdb kernel: [419827.630750] Tasks state (memory values in pages): Dec 21 06:55:33 gdb kernel: [419827.630752] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name Dec 21 06:55:33 gdb kernel: [419827.630790] [ 926] 0 926 72470 811 507904 0 -250 systemd-journal Dec 21 06:55:33 gdb kernel: [419827.630794] [ 960] 0 960 8269 1075 77824 0 -1000 systemd-udevd Dec 21 06:55:33 gdb kernel: [419827.630798] [ 1623] 0 1623 729 28 32768 0 0 mdadm Dec 21 06:55:33 gdb kernel: [419827.630800] [ 1672] 0 1672 23007 217 49152 0 -1000 auditd Dec 21 06:55:33 gdb kernel: [419827.630803] [ 1674] 0 1674 1568 90 36864 0 0 sedispatch Dec 21 06:55:33 gdb kernel: [419827.630806] [ 1712] 0 1712 78709 787 98304 0 0 ModemManager Dec 21 06:55:33 gdb kernel: [419827.630808] [ 1714] 0 1714 571 16 32768 0 0 acpid Dec 21 06:55:33 gdb kernel: [419827.630811] [ 1719] 81 1719 2891 845 49152 0 -900 dbus-daemon Dec 21 06:55:33 gdb kernel: [419827.630813] [ 1727] 992 1727 599 38 32768 0 0 lsmd Dec 21 06:55:33 gdb kernel: [419827.630815] [ 1730] 0 1730 619 33 32768 0 0 mcelog Dec 21 06:55:33 gdb kernel: [419827.630817] [ 1735] 999 1735 743772 1030 229376 0 0 polkitd Dec 21 06:55:33 gdb kernel: [419827.630820] [ 1736] 0 1736 77985 204 90112 0 0 rngd Dec 21 06:55:33 gdb kernel: [419827.630827] [ 1739] 0 1739 2711 421 49152 0 0 smartd Dec 21 06:55:33 gdb kernel: [419827.630829] [ 1741] 0 1741 20070 151 40960 0 -500 irqbalance Dec 21 06:55:33 gdb kernel: [419827.630831] [ 1743] 0 1743 4492 227 61440 0 0 systemd-machine Dec 21 06:55:33 gdb kernel: [419827.630837] [ 1753] 0 1753 114058 472 110592 0 0 abrtd Dec 21 06:55:33 gdb kernel: [419827.630842] [ 1794] 0 1794 4780 468 65536 0 0 systemd-logind Dec 21 06:55:33 gdb kernel: [419827.630844] [ 1830] 0 1830 263593 479 929792 0 0 abrt-dump-journ Dec 21 06:55:33 gdb kernel: [419827.630846] [ 1831] 0 1831 261511 460 925696 0 0 abrt-dump-journ Dec 21 06:55:33 gdb kernel: [419827.630850] [ 2802] 0 2802 199635 606 299008 0 0 esfdaemon Dec 21 06:55:33 gdb kernel: [419827.630852] [ 2803] 0 2803 72799 12101 200704 0 0 bare-agent Dec 21 06:55:33 gdb kernel: [419827.630855] [ 2805] 0 2805 59117 340 86016 0 0 cupsd Dec 21 06:55:33 gdb kernel: [419827.630856] [ 2810] 0 2810 251667 734 1376256 0 0 rsyslogd Dec 21 06:55:33 gdb kernel: [419827.630863] [ 2814] 0 2814 3350 227 53248 0 -1000 sshd Dec 21 06:55:33 gdb kernel: [419827.630865] [ 2815] 0 2815 117707 3324 143360 0 0 tuned Dec 21 06:55:33 gdb kernel: [419827.630869] [ 2828] 0 2828 65710 188 73728 0 0 gssproxy Dec 21 06:55:33 gdb kernel: [419827.630872] [ 2848] 0 2848 53496 92 45056 0 0 init.ohasd Dec 21 06:55:33 gdb kernel: [419827.630874] [ 2890] 0 2890 906 48 32768 0 0 atd Dec 21 06:55:33 gdb kernel: [419827.630875] [ 2896] 0 2896 53748 118 49152 0 0 crond Dec 21 06:55:33 gdb kernel: [419827.630878] [ 3692] 0 3692 3539 148 49152 0 0 xinetd Dec 21 06:55:33 gdb kernel: [419827.630880] [ 3978] 0 3978 10985 242 61440 0 0 master Dec 21 06:55:33 gdb kernel: [419827.630884] [ 4004] 89 4004 11331 527 69632 0 0 qmgr Dec 21 06:55:33 gdb kernel: [419827.630888] [ 4093] 0 4093 43766 216 221184 0 0 sddog Dec 21 06:55:33 gdb kernel: [419827.630890] [ 4112] 0 4112 285705 537 577536 0 0 sdmonitor Dec 21 06:55:33 gdb kernel: [419827.630891] [ 4233] 0 4233 134053 596 466944 0 0 sdcc Dec 21 06:55:33 gdb kernel: [419827.630895] [ 4259] 0 4259 168947 8371 667648 0 0 sdec Dec 21 06:55:33 gdb kernel: [419827.630897] [ 4284] 0 4284 286675 1588 778240 0 0 sdexam Dec 21 06:55:33 gdb kernel: [419827.630899] [ 4310] 0 4310 492216 50216 1331200 0 0 sdsvrd Dec 21 06:55:33 gdb kernel: [419827.630906] [ 4330] 0 4330 29248 278 278528 0 0 udcenter Dec 21 06:55:33 gdb kernel: [419827.630908] [ 8353] 0 8353 2184 321 45056 0 0 dhclient Dec 21 06:55:33 gdb kernel: [419827.630910] [ 9243] 1086 9243 5274 639 73728 0 0 systemd Dec 21 06:55:33 gdb kernel: [419827.630915] [ 9245] 1086 9245 6383 1015 73728 0 0 (sd-pam) Dec 21 06:55:33 gdb kernel: [419827.630918] [ 9348] 1086 9348 470112 50291 761856 0 0 java Dec 21 06:55:33 gdb kernel: [419827.630920] [ 9426] 0 9426 2184 323 45056 0 0 dhclient Dec 21 06:55:33 gdb kernel: [419827.630922] [ 9852] 0 9852 53214 26 36864 0 0 agetty Dec 21 06:55:33 gdb kernel: [419827.630926] [ 11463] 1002 11463 5276 639 73728 0 0 systemd Dec 21 06:55:33 gdb kernel: [419827.630936] [ 11465] 1002 11465 6383 1016 73728 0 0 (sd-pam) Dec 21 06:55:33 gdb kernel: [419827.630942] [ 11611] 1002 11611 14284908 1404 602112 0 0 agent60 Dec 21 06:55:33 gdb kernel: [419827.630945] [ 137615] 0 137615 136163 3215 147456 0 0 lvmdbusd Dec 21 06:55:33 gdb kernel: [419827.630950] [ 796407] 2036 796407 5301 649 73728 0 0 systemd Dec 21 06:55:33 gdb kernel: [419827.630952] [ 796409] 2036 796409 43812 1109 94208 0 0 (sd-pam) Dec 21 06:55:33 gdb kernel: [419827.630954] [ 817343] 2032 817343 53508 130 53248 0 0 mysqld_safe Dec 21 06:55:33 gdb kernel: [419827.630956] [2270020] 2032 2270020 2778466 1788 1466368 0 0 dbinit Dec 21 06:55:33 gdb kernel: [419827.630958] [2567710] 2032 2567710 77307141 50817311 424357888 0 0 mysqld Dec 21 06:55:33 gdb kernel: [419827.630960] [3453494] 998 3453494 1173 50 36864 0 0 chronyd Dec 21 06:55:33 gdb kernel: [419827.630963] [3621338] 89 3621338 11065 249 65536 0 0 pickup Dec 21 06:55:33 gdb kernel: [419827.630981] [3662845] 0 3662845 5297 648 73728 0 0 systemd Dec 21 06:55:33 gdb kernel: [419827.630983] [3662881] 0 3662881 44244 1356 98304 0 0 (sd-pam) Dec 21 06:55:33 gdb kernel: [419827.630985] [3662906] 89 3662906 11068 242 65536 0 0 trivial-rewrite Dec 21 06:55:33 gdb kernel: [419827.630987] [3663080] 0 3663080 10991 235 65536 0 0 local Dec 21 06:55:33 gdb kernel: [419827.630988] [3663097] 89 3663097 11131 254 65536 0 0 smtp Dec 21 06:55:33 gdb kernel: [419827.630990] [3663098] 0 3663098 10991 235 65536 0 0 local Dec 21 06:55:33 gdb kernel: [419827.630992] [3663108] 89 3663108 11073 242 65536 0 0 bounce Dec 21 06:55:33 gdb kernel: [419827.630994] [3663141] 0 3663141 10991 235 65536 0 0 local Dec 21 06:55:33 gdb kernel: [419827.630997] [3663177] 89 3663177 11066 242 69632 0 0 flush Dec 21 06:55:33 gdb kernel: [419827.631003] [3663193] 89 3663193 11066 242 69632 0 0 flush Dec 21 06:55:33 gdb kernel: [419827.631005] [3663201] 89 3663201 11066 242 69632 0 0 flush Dec 21 06:55:33 gdb kernel: [419827.631007] [3663207] 0 3663207 53463 54 45056 0 0 sh Dec 21 06:55:33 gdb kernel: [419827.631011] [3663208] 0 3663208 884643 7048 589824 0 0 promtail Dec 21 06:55:33 gdb kernel: [419827.631019] [3663317] 89 3663317 11131 254 65536 0 0 smtp Dec 21 06:55:33 gdb kernel: [419827.631023] [3663318] 89 3663318 11131 254 65536 0 0 smtp Dec 21 06:55:33 gdb kernel: [419827.631025] [3663319] 89 3663319 11131 254 65536 0 0 smtp Dec 21 06:55:33 gdb kernel: [419827.631026] [3663320] 89 3663320 11131 254 65536 0 0 smtp Dec 21 06:55:33 gdb kernel: [419827.631028] [3663321] 89 3663321 11064 242 65536 0 0 error Dec 21 06:55:33 gdb kernel: [419827.631030] [3663322] 89 3663322 11064 242 65536 0 0 error Dec 21 06:55:33 gdb kernel: [419827.631032] [3663388] 0 3663388 53093 15 40960 0 0 sleep Dec 21 06:55:33 gdb kernel: [419827.631048] [3663946] 0 3663946 4458 86 61440 0 0 systemd-cgroups Dec 21 06:55:33 gdb kernel: [419827.631060] [3663947] 0 3663947 4071 84 57344 0 0 systemd-cgroups Dec 21 06:55:33 gdb kernel: [419827.631062] oom-kill:constraintCONSTRAINT_NONE,nodemask(null),cpuset/,mems_allowed0-1,global_oom,task_memcg/user.slice/user-2036.slice/session-6188.scope,taskmysqld,pid2567710,uid2032 Dec 21 06:55:33 gdb kernel: [419827.631071] Out of memory: Kill process 2567710 (mysqld) score 516 or sacrifice child Dec 21 06:55:33 gdb kernel: [419827.632542] Killed process 2567710 (mysqld) total-vm:309228564kB, anon-rss:203269244kB, file-rss:0kB, shmem-rss:0kB2发生现象Dec 21 06:55:33 gdb kernel: [419827.630493] crontab-1 invoked oom-killer: gfp_mask0x6200ca(GFP_HIGHUSER_MOVABLE), order0, oom_score_adj0 Dec 21 06:55:33 gdb kernel: [419827.632542] Killed process 2567710 (mysqld) total-vm:309228564kB, anon-rss:203269244kB, file-rss:0kB, shmem-rss:0kB上述关键信息为进程crontab-1申请新的内存引起oom-killer而被kill进程为mysqld占用内存大小203269244kB3) NUMA占用分析Dec 21 06:55:33 gdb kernel: [419827.630672] Node 0 Normal free:10117540kB min:10117912kB low:12647388kB high:15176864kB active_anon:100207152kB inactive_anon:548kB active_file:808kB inactive_file:0kB unevictable:0kB writepending:0kB present:198180864kB managed: 194894048kB mlocked:0kB kernel_stack:13504kB pagetables:215840kB bounce:0kB free_pcp:536kB local_pcp:0kB free_cma:0kB Dec 21 06:55:33 gdb kernel: [419827.630679] lowmem_reserve[]: 0 0 0 0 0 Dec 21 06:55:33 gdb kernel: [419827.630683] Node 1 Normal free:10287732kB min:10288284kB low:12860352kB high:15432420kB active_anon:103176592kB inactive_anon:860kB active_file:1324kB inactive_file:80kB unevictable:0kB writepending:880kB present:201326592kB mana ged:198175752kB mlocked:0kB kernel_stack:11836kB pagetables:210288kB bounce:0kB free_pcp:21924kB local_pcp:332kB free_cma:0kB从上述日志可以看出两个numa node的剩余free内存均低于了min的要求内存。4) 内存占用统计根据OOM记录的日志信息内存大概有如下分配(注意系统日志中rss列的单位为页默认4k大小)进程占用内存mysqld193G其他进程641MNUMA剩余19.5G上述内存远低于操作系统内存376G缺失近163G5) 大页分析继续查看系统日志Dec 21 06:55:33 gdb kernel: [419827.630731] Node 0 hugepages_total0 hugepages_free0 hugepages_surp0 hugepages_size1048576kB Dec 21 06:55:33 gdb kernel: [419827.630737] Node 0 hugepages_total40960 hugepages_free40960 hugepages_surp0 hugepages_size2048kB Dec 21 06:55:33 gdb kernel: [419827.630738] Node 1 hugepages_total0 hugepages_free0 hugepages_surp0 hugepages_size1048576kB Dec 21 06:55:33 gdb kernel: [419827.630741] Node 1 hugepages_total40960 hugepages_free40960 hugepages_surp0 hugepages_size2048kB解析为页类型总页数量空闲页numanode02M4096040960numanode01G00numanode12M4096040960numanode11G00可见大页占用了2M x 40960 x 2160G内存并且没有被使用刚好和内存统计相近4、大页配置查看1) 检查透明大页配置cat /sys/kernel/mm/transparent_hugepage/enabled确认是关闭状态[rootgdb ~]# cat /sys/kernel/mm/transparent_hugepage/enabled always madvise [never]2) 检查传统大页配置sysctl -p | grep vm可见并没有相关配置[rootgdb ~]# sysctl -p | grep vm vm.zone_reclaim_mode0 vm.swappiness1 vm.min_free_kbytes204800003) 大页特性对比特性维度传统大页透明大页检查方式/etc/sysctl.conf中的vm.nr_hugepages/sys/kernel/mm/transparent_hugepage/enabled管理机制静态预分配。在系统启动或配置后内核立即从物理内存中划出指定数量的大页。这部分内存被“锁定”专用于大页不能被挪作他用如进程的普通小页。动态分配。内核在运行时根据内存访问模式如连续的512个4K页被频繁访问自动将小页合并成一个大页或者在不再需要时拆分回小页。这是一个“按需”的过程。配置方式1. 临时sysctl -w vm.nr_hugepagesN2. 永久在/etc/sysctl.conf中添加vm.nr_hugepagesN重启或执行sysctl -p生效。1. 临时echo /sys/kernel/mm/transparent_hugepage/enabled2. 永久通过内核启动参数vi /etc/default/grub在GRUB_CMDLINE_LINUX变量中添加transparent_hugepagealways重新生成GRUB配置grub2-mkconfig -o /boot/grub2/grub.cfg内存使用专用且独占。分配后即使不使用也会一直占用物理内存可能导致内存浪费。共享池。使用普通的内存页池只在需要时才转换内存利用率更高。性能特点性能稳定可预测。应用程序如Oracle DB, Redis通过mmap()或shmget()显式请求大页时能100%保证使用大页无缺页中断或合并操作开销性能最优、最稳定。性能有波动风险。虽然大多数情况下能提升性能减少TLB Miss但在内存压力大或碎片化时内核的合并/拆分操作khugepaged进程会带来不可预测的延迟尖峰对延迟敏感型应用不利。根据故障现象及大页特点猜测应该是由于配置了传统大页锁定了160G内存无法被其他进程使用但是配置文件中并没有该配置现象很奇怪4) 深度搜索使用命令grep -R nr_hugepages /etc进行大范围深度搜索发现了问题所在[rootgdb ~]# grep -R nr_hugepages /etc /etc/sysctl.conf.bak-2025-07-13:vm.nr_hugepages81920可以看到配置文件在7月13日进行了备份调整备份前确实是有传统大页配置并且配置值和目前系统日志中记录值相同。5) 配置变更测试通过测试发现即使配置文件中去传统大页设置但是依然是存在大页设置的[rootqdb -]# cat /etc/sysctl.conf | grep hkernel.shmall41943040kernel.shmmax171798691840kernel.shmmni4096#vm.hugetlb_shm_group54321#vm.nr_hugepages 40960[rootqdb -]# sysctl -p | grep hkernel.shmall41943040kernel.shmmax171798691840kernel.shmmi4096[rootqdb -]# cat /proc/sys/vm/nr_hugepages40960调整配置后如果不重启操作系统需要手动释放该部分内存[rootgdb ~]# echo 0 /proc/sys/vm/nr_hugepages[rootgdb ~]# cat /proc/sys/vm/nr_hugepages0三、原因总结改进1) 根本原因大量 HugePages 被预留但数据库未实际使用导致普通内存不足引发 OOM2) 不正常的默认大页配置在操作系统默认情况下未配置nr_hugepages因此最初分析时未考虑传统大页方向。后经数据对比发现传统大页存在内存占用异常现象。经后续核实由于该服务器为利旧使用残留了Oracle相关配置导致该隐藏问题未被及时发现又是一个国产化过程的小坑。3) 后续改进在基于现有服务器初始化步骤中增加传统大页的检查设置步骤sed -i /huge/d /etc/sysctl.conf sysctl -p | grep huge echo 0 /proc/sys/vm/nr_hugepages