Background
Memory leak is an abnormal growth of memory usage, caused by either in Kernel Space or in User Space.
The memory allocated, but not freed, which will significantly impact the performance of the machine and might cause the machine to crash.
This article describes a procedure for detecting a memory leak in Kernel Space (memory leaks in User Space are detected using specical tools - e.g., valgrind - for specific process).
For 61000 Security System, refer to sk98387 (How to detect a kernel memory leak on 61000 Security System: Gateway mode).
Procedure
Note: The kernel parameters described below can be enabled (value set to 1) indefinitely without any impact - neither on security, nor on performance.
- To enable memory leak detection, set the following kernel parameters in
$FWDIR/boot/modules/fwkern.conf
file per sk26202.fw_salloc_debug_leaks=1
fw_hmem_debug_leaks=1
fw_kmem_cphwd_use_fw=1
fw_kmem_detailed_leak_report=1
fw_kdprintf_limit=0
fw_kdprintf_limit_time=0
- Save the changes and reboot the machine.
- Verify that the values for kernel parameters were accepted:
[Expert@HostName]# fw ctl get int fw_salloc_debug_leaks
[Expert@HostName]# fw ctl get int fw_hmem_debug_leaks
[Expert@HostName]# fw ctl get int fw_kmem_cphwd_use_fw
[Expert@HostName]# fw ctl get int fw_kmem_detailed_leak_report
[Expert@HostName]# fw ctl get int fw_kdprintf_limit
[Expert@HostName]# fw ctl get int fw_kdprintf_limit_time
- Collect CPinfo file:
[Expert@HostName]# cpinfo -z -n -o /var/log/$(uname -n)_before.cpinfo
- Let the system run for at least several days - if possible, stress the machine by passing complex traffic through the gateway.
- On Gaia OS: Stop RouteD daemon:
[Expert@HostName]# tellpm process:routed
Notes:- In R76 cluster, this might cause a fail-over between cluster members (starting in R76, a new Device Name / Pnote called '
routed
' was introduced). Refer to sk92787. - If RouteD daemon is not stopped, then Check Point kernel module will not be able to unload (in Steps 8,9 and 10) because
/dev/fw*
devices will remain in use, which can be seen in the output of 'lsof | grep -v grep | grep -E "PID|routed" | grep -E "PID|/dev/fw"
' command. - This step applies only to Gaia R75.40 / R75.40VS / R75.45 / R75.46 / R76.
- This issue was fixed in R75.47
- In R76 cluster, this might cause a fail-over between cluster members (starting in R76, a new Device Name / Pnote called '
- CRUCIAL: Collect CPinfo file right before next Step 8:
[Expert@HostName]# cpinfo -z -n -o /var/log/$(uname -n)_during.cpinfo
- Stop all Check Point processes and applications:
[Expert@HostName]# cpstop
- Stop all Check Point services:
[Expert@HostName]# service cpboot stop
- Unload the Check Point kernel modules:
[Expert@HostName]# cpstop -fwflag -driver
Note: check the output carefully - there should NOT be
any messages telling that FireWall kernel module could not be unloaded.Example of problematic message:
fwmod_smp.2.4.21.cp.i686: Device or resource busy
Possible reasons that the module is still being used:- Policy installation was in progress
- Kernel debug was running
- Some User Space process is still using the FireWall module (
fwmod
)
- Perform the following checks and the previous step again ('
- Stop policy installation
- Stop kernel debug
- (A)
Check that only theerror
, orwarning
, ornone
, ornone
flags were enabled for different modules[Expert@FW]# fw ctl debug
To default the flags run[Expert@FW]# fw ctl debug 0
- (B)
Check that no kernel debugs are running
The output of the following command should be empty[Expert@FW]# ps auxw | grep -v 'grep' | grep 'debug'
- (A)
- Stop the User Space process that uses the FireWall module
- (A)
The best practice is to try stopping the Service, which runs this process via Linux 'service
' command
If no such Service exists, then go to next Step (B)
Example:[Expert@FW]# lsof /dev/fw0
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
cpsnmpage 1112 root 20u CHR 253,0 65622 /dev/fw0
In this case, 'cpsnmpage
' process is/usr/sbin/cpsnmpagentx
The service, which runs this process isSNMP
Try stopping the SNMP service via Linux 'service
' command[Expert@FW]# service snmp stop
If 'lsof /dev/fw0
' command still shows this process, then try stopping the SNMP service via SNMP command[Expert@FW]# snmp service disable
If 'lsof /dev/fw0
' command still shows this process, then go to next Step (B) - (B)
Kill the process that uses the FireWall module[Expert@HostName]# kill -KILL PID_of_Process
NOTE: PID of the process appears in the output of 'lsof
' command in 2nd column 'PID
'
If 'lsof /dev/fw0
' command still shows this process, then contact Contact Check Point Support
- (A)
service cpboot stop
') - Check if FireWall kernel module is still loaded:
[Expert@HostName]# lsmod | grep fwmod
Note: This step is relevant for R6x versions only, skip this step for R7x versions. - If FireWall kernel module is still loaded, unload it manually:
[Expert@HostName]# rmmod <NAME_OF_FWMOD>
Notes:- This step is relevant for R6x versions only, skip this step for R7x versions.
- Check the output carefully - there should NOT be any messages telling that kernel module could not be unloaded.
- CRUCIAL: Collect the memory leak information by using the following exact syntax:
[Expert@HostName]# \date >> /var/log/leak.txt
[Expert@HostName]# dmesg >> /var/log/leak.txt
[Expert@HostName]# \date >> /var/log/leak.txt
- Collect CPinfo file:
[Expert@HostName]# cpinfo -z -n -o /var/log/$(uname -n)_after.cpinfo
- Start the Check Point services:
[Expert@HostName]# service cpboot start
- Start Check Point processes and applications:
[Expert@HostName]# cpstart
- On Gaia OS: Start RouteD daemon (which was stopped in Step 6):
[Expert@HostName]# tellpm process:routed t
- Send the following files to Check Point Support:
/var/log/leak.txt
/var/log/messag*
/var/log/<HostName>_before.cpinfo.gz
/var/log/<HostName>_during.cpinfo.gz
/var/log/<HostName>_after.cpinfo.gz
- To disable memory leak detection, set the following kernel parameters in
$FWDIR/boot/modules/fwkern.conf
file per sk26202.fw_salloc_debug_leaks=0
fw_hmem_debug_leaks=0
fw_kmem_cphwd_use_fw=0
fw_kmem_detailed_leak_report=0
fw_kdprintf_limit=30
fw_kdprintf_limit_time=60
Note: Another way to disable memory leak detection is to delete all these parameters from the$FWDIR/boot/modules/fwkern.conf
file. - Save the changes in
$FWDIR/boot/modules/fwkern.conf
file and reboot the machine.
No comments:
Post a Comment