Thursday, October 7, 2010

Cisco PIX or ASA vs. Checkpoint Firewall

Introduction

Firewall technology ranges from packet filtering to application-layer proxies, to Stateful inspection; each technique gleaning the benefits from its predecessor.

Stateful inspection works at the network layer and does not require a separate proxy for each application. This technology does not suffer from the same degradation in performance as application-level technology (proxies), which involves the extra overhead of transporting data up to the application layer. And on the contrary of packet filters it has the ability to maintain session state and therefore increase the security level of a network transaction.



Checkpoint Firewall-1

Checkpoint FW-1 has been the firewall market leader since shortly after its introduction in 1994/95. Its well-designed GUI interface was, and still is, the best visual interface to any firewall product. This intuitive interface makes FW-1 easy to work with even for those new to firewalls.

FireWall-1 is based upon Stateful Inspection technology, the de facto standard for firewalls. Invented by Check Point, Stateful Inspection provides the highest level of security. FireWall-1’s scalable, modular architecture enables an organization to define and implement a single, centrally managed Security Policy. The enterprise Security Policy is defined on a central management server trough a GUI and downloaded to multiple enforcement points (Inspection Modules) throughout the network.
The FireWall-1 Inspection Module is located in the operating system (NT or UNIX operating systems) kernel at the lowest software level. The Inspection Module analyzes all packets before they reach the gateway operating systems. Packets are not processed by any of the higher protocol layers unless FireWall-1 verifies that they comply with the Inspection Module security policy (it examines communications from any IP protocol or application, including stateless protocols, such as UDP and RPC)

PIX Firewall

Originally designed to be a network address translator, Cisco introduced the Private Internet Exchange (PIX) Firewall series in 1994. The PIX Firewall is a high-performance firewall that uses Stateful packet filtering. The PIX Firewall is essentially a firewall appliance"--it has its own integrated hardware/software solution (Intel hardware / proprietary OS). The PIX Firewall is not UNIX or NT-based, but is based on a secure, real-time embedded system, known as the Adaptive Security Algorithm (ASA), which offers Stateful inspection technology. ASA tracks the source and destination address, TCP sequence numbers, port numbers, and additional TCP flags. All inbound and outbound traffic is controlled by applying the security policy to connection table entries, which house the information. Access is permitted through the PIX Firewall only if a connection has been validated or if it has been explicitly configured.


Comparison

PIX and checkpoint FW-1 are using similar technologies in that both use smart packet filtering technologies (Stateful technology).

There are several key differences: one is that FW1 uses a general-purpose operating system while Cisco's PIX uses an embedded operating system. Another is that the PIX are essentially a "diode": you define a security level for an interface, and anything from a higher (internal=100) to a lower (external=0) is allowed while lower (external) to higher (internal) is blocked (with coding for exception); with FW1 there are no native directions, and everything must be coded. (For this reason, FW1 can be found much more flexible)
The license structure on the PIX is per-connection; the license structure on FW1 is per protected host. All other things being equal, maintenance is much easier on the PIX, and performance is higher on the PIX. Cisco has recently released a host-to-LAN encryption solution; FW1 has such a solution for a long time now (SecuRemote for windows boxes). FW1 has extra features such as bandwidth management (floodgate) or content vectoring servers and others (see OPSEC products).

Note that FW1 is developed in a UNIX environment. The UNIX implementation is more efficient, more mature, and more stable. It is wrong to go with NT unless the client swears he can support NT and is afraid of UNIX. Also, comparing FW1 on a switch or on a NOKIA box versus the PIX could be kind of an interesting comparison.


PIX Pros:

1) Minimal configuration if you have few or zero internal devices that needs to be accessed directly from the Internet (i.e. web servers on a protected DMZ) and want to allow everything outbound.

2) Complete hardware/software solution, no additional OS vulnerabilities or boot-time errors to worry about.

3) Cisco support, which is generally very good.

4) Performance, probably the best in the business.

5) No special client side software other than telnet, tftp or serial port terminal software.

6) Lots of detailed documentation.

7) Free upgrades


PIX Cons:

1) Difficult to manage if you have many servers on a protected DMZ (lots and lots of conduit statements) or many firewalls to manage.

2) Routing limitation in complex network architectures (Need to add a router for EACH segment).

3) Command line (IOS style) based. Cisco GUI manager (PIX Firewall Manager) is currently in its early releases and not as functional as FW-1's.

4) No ability to off-load layer 7 services like: virus scanning, URL filtering, etc. You can filter on outgoing traffic, but the process is not dynamic.

5) Requires a separate syslog server for logging.

6) No source port filtering.

8) No clear documentation (Cisco's documentation is often conflicting, fails to explain which version of the PIX OS a certain configuration will or will not work under, and seems to be constantly changing).

FW-1 Pros:

1) Very functional GUI interface.

2) Based on Stateful inspection like PIX, but can off-load layer 7 inspection to other servers if required.

3) Lots of features for complex environments like: large protected DMZ, Windows VPN support, firewall synchronization, bi-directional NAT, etc.

4) Can be used to control bi-directional traffic.


5) Complex logging provided on management station.


FW-1 Cons:

1) Must account for OS vulnerabilities as well as FW-1 vulnerabilities.

2) Performance on NT not as good as on UNIX or the PIX.

3) Support is only through re-sellers, very expensive (Contracts start at 50% of the price of the original software per year) and needed for products upgrades.

4) OS boot-time errors possibilities.


NB: PIX can filter java but no ActiveX or JavaScript filtering yet. (Although FW-1 can)


Conclusion

In the simplest terms, FW-1 can be considered much more functional than the PIX, while the PIX have better performance and support. If your particular environment requires a lot of functionality, the best choice is the FW-1 solution, although you might want to consider running it on a UNIX platform rather than a NT platform. If your environment is pretty simple, PIX is a solid solution with very good performance.

Troubleshooting of License upgrade in Check point firewall

Troubleshooting License Upgrade

License upgrade is a smooth and easy process. There are a few predictable cases where you may come across some problems. Use this section to solve those license upgrade problems.In This Section Error: “License version might be not compatible” SecureKnowledge solution sk30478

Symptoms
• Error: Warning: Can't find .... in cp.macro. License version might be not compatible
• Error occurs with commands such as cplic print, cpstop, cpstart, and fw ver.
• The error occurs when a license upgrade is performed before a software upgrade.

The error appears in any situation where a licensed version is not compatible with the version installed on a machine, for example, an NGX license on an NG machine.

Cause

License on the target machine was upgraded to NGX before the software was upgraded
from a previous NG version to NGX.
Error: “License version might be not compatible”
Evaluation Licenses Created in the User Center
Evaluation Licenses Not Created in the User Center
Licenses of Products That Are Not Supported in NGX
License Enforcement on Module is now on Management
License Not in Any Of Your User Center Accounts
User Does Not Have Permissions on User Center Account
SKU Requires Two Licenses in NG and One License in NGX
SmartDefense Licenses
License Upgrade Partially Succeeds
Upgraded Licenses Do Not Appear in the Repository
Cannot Connect to the User Center

Troubleshooting License Upgrade
If the license upgrade is performed before the software upgrade, Check Point products
will generate warning messages until all the software on the machine has been
upgraded. Refer to “License Upgrade Methods” on page 25 to determine the upgrade
path that best applies to your current configuration.

Resolution

Upgrade the software to version NGX. Errors will not appear after the upgrade.
Note that these errors do not affect the functionality of the version NG software.
Evaluation Licenses Created in the User Center

Symptoms
User Center message (Error code: 106):
Cause
Evaluation licenses are not entitled to a license upgrade.
Resolution
Evaluation licenses cannot be upgraded. If you don’t need the evaluation license, delete
it. If you do need it, contact Account Services at US +1 817 606 6600, option 7 or
e-mail AccountServices@ts.checkpoint.com.
Evaluation Licenses Not Created in the User Center
Symptoms
User Center message (Error code: 151):
Cause
These evaluation licenses do not exist in the User Center. Evaluation licenses are not
entitled to a license upgrade.
An evaluation license can be identified by examining the license string. Evaluation
licenses may contain one of the following strings in the Features description:
CK-CP
or
No license upgrade is available for evaluation product.
Your license contains a Certificate Key (CK) which is not found in
User Center.
Licenses of Products That Are Not Supported in NGX
Chapter 2 Upgrading VPN-1 Pro/Express Licenses 39
CK-CHECK-POINT-INTERNAL-USE-ONLY
Resolution
Evaluation licenses cannot be upgraded. If you don’t need the evaluation license, delete
it. If you do need it, contact Account Services at US +1 817 606 6600, option 7 or
e-mail AccountServices@ts.checkpoint.com.
Licenses of Products That Are Not Supported in NGX
Symptoms
User Center Message (Error code: 154):
Cause
VPN-1 Net and VPN-1 SmallOffice are not supported in NGX. Therefore, if an
attempt is made to upgrade the license for these products, the User Center generates an
error message. The affected SKUs are:
VPN-1 Net Family SKUs: CPVP-VNT and LS-CPVP-VNT families
SmallOffice family SKUs: CPVP-VSO and LS- CPVP-VSO families
Resolution
Contact Account Services at US +1 817 606 6600, option 7 or e-mail
AccountServices@ts.checkpoint.com.
License Enforcement on Module is now on Management
Symptoms
User Center Message (Error code: 132):
Cause
The enforcement of NG module features is now performed by the NGX management.
For example, the licensing model of QOS (formerly FloodGate-1) for VPN-1 Express
was changed in NGX, and VPN-1 Express NGX modules with QoS require an
This product is not upgradeable to NGX version and therefore a
license upgrade is not needed. The product continues to be
supported in its NG Release
The license enforcement of NG gateway is now performed by the NGX
management server. Perform Change IP operation in User Center and
install the NGX license on the management server
Troubleshooting License Upgrade
40
appropriate license to be installed on the management. License Upgrade in this scenario
is not handled automatically by the license upgrade. The affected SKU family for QoS
is: CPXP-QOS
Resolution
If you have an NG Express gateway with a QoS (FloodGate-1) license, and in any other
case where this problem occurs, proceed as follows:
1 Perform a license upgrade at the User Center web site to generate a new license.
2 Install the new, upgraded license on the NGX management machine (even if you
do not upgrade the gateway).
3 Upgrade the gateway.
4 Delete the unneeded license from the gateway in one of two ways:
• Run the command line command at the gateway:
cplic del
• Using SmartUpdate, select the unneeded license, Detach it, and then Delete it.
License Not in Any Of Your User Center Accounts
Symptoms
User Center Message (Error Code 17):
Cause
This specific license does not exist in any of the accounts that belong to this user.
Resolution
Run the tool again with the appropriate username.
Note that each time you run the tool with a different username, upgraded licenses from
the User Center are added to a cache file located on your machine. This file contains
the successfully upgraded licenses from previous runs.
If the partially successful license upgrade was performed via the Wrapper, then after the
Wrapper has finished, run the license upgrade again via the command line, with the
appropriate username.
This license is not in any of your accounts. Run the license
upgrade again with the username that owns this license in the User
Center.
User Does Not Have Permissions on User Center Account
Chapter 2 Upgrading VPN-1 Pro/Express Licenses 41
User Does Not Have Permissions on User Center Account
Symptoms
User Center Message (Error Code 19):
Cause
This user is not authorized to change this license in the User Center.
Resolution
Run the tool again with the appropriate username.
Note that each time you run the tool with a different username, upgraded licenses from
the User Center are added to a cache file located on your machine. This file contains
the successfully upgraded licenses from previous runs.
If the partially successful license upgrade was performed via the Wrapper, then after the
Wrapper has finished, run the license upgrade again via the command line, with the
appropriate username.
SKU Requires Two Licenses in NG and One License in NGX
Symptoms
User Center Message (Error code: 135):
Cause
The NG version of SecureClient requires two licenses: one license for the module and
one for the management. In NGX only the management license is needed. The module
license (CPVP-VPS-1-NG) is no longer needed because it is incorporated in the
VPN-1 Pro license. The relevant SKU families are:
• CPVP-VSC,
• LS- CPVP-VSC,
• CPVP-VMC,
• LS-CPVP-VMC,
• CPVP-VSC-100-DES-NG
This license is in your account but you are not authorized to
upgrade licenses in this account because you have just view-only
permissions. Run license upgrade again with a username that is
authorized to change the license in the User Center.
This license is no longer needed in the version you are upgrading
to. It can be safely removed from the machine after the software
upgrade.
Troubleshooting License Upgrade
42
Resolution
After the software upgrade, delete the unneeded module license from the machine. Do
this in one of two ways:
• Using the command line: Run
cplic del
• Using SmartUpdate: Select the unneeded license, Detach it, and then Delete it.
SmartDefense Licenses
Symptoms
User Center Message (Error code: 902):
Cause
In NGX, enforcement of SmartDefense licenses is handled by the User Center. The
SKU families for which this issue is relevant are SU-SMRD and SU-SMDF.
Resolution
Delete the unneeded license from the machine.
License Upgrade Partially Succeeds
Symptoms
License upgrade fails for some of the licenses but succeeds for others.
Cause
License upgrade may fail for some licenses and succeed for others. A license may fail to
upgrade for a number of reasons. For example, you may not have an Enterprise
Subscription contract for these licensed product. See some of the other items in
“Troubleshooting License Upgrade” on page 37 for more reasons why license upgrade
may fail.
Resolution
After solving all or some of the licensing problems referred to in the error log, run the
license_upgrade tool. This will upgrade the licenses for which the problem has been
solved.
The tool can be found in one of the following locations
• On the CD at
SmartDefense License is not needed on the gateway.
Upgraded Licenses Do Not Appear in the Repository
Chapter 2 Upgrading VPN-1 Pro/Express Licenses 43
• In the Check Point Download site at
http://www.checkpoint.com/techsupport/ngx/license_upgrade.html.
When the license_upgrade tool is run several times, the results are cumulative. This
means that if the upgrade of some licenses failed and the tool is run again:
• Licenses that were successfully upgraded to NGX remain unchanged.
• Licenses that failed to upgrade in a previous run and were now successfully
upgraded, are added to the machine.
For example, if the upgrade of a license failed because there was no Enterprise Software
Subscription contract for the licensed product, purchase Software Subscription for those
products and then run the tool again to fetch the new licenses from the User Center
Web site to the machine.
Upgraded Licenses Do Not Appear in the Repository
Symptoms
Upgraded license does not appear in the SmartUpdate Repository. However, the
license_upgrade tool log indicates that the license upgrade succeeded.
The license upgrade was performed on the NGX machine, after the software upgrade
to NGX.
Cause
The file with the upgraded licenses that was fetched from the User Center cannot be
imported into the SmartUpdate Repository while SmartUpdate is open.
Resolution
Close any SmartUpdate GUI client that is running, and run
license_upgrade import -r
This imports the upgraded licenses into the SmartUpdate Repository.
Cannot Connect to the User Center
Symptom
Failed to connect to the User Center.
Cause
Access to port HTTPS-443 is not allowed through the firewall. Access to the User
Center requires this port to be open.
Troubleshooting License Upgrade
44
Resolution
Open port HTTPS-443 in the firewall.
For example, in a deployment with one main firewalled gateway, and other gateways for
branch offices within the organization, open HTTPS-443 in the main gateway for all
the branch office gateways behind it.

Upgrading Nortel Switched Firewall

Purpose: This document provides the procedure to upgrade the existing cluster of Nortel Switched Firewall. The cluster of accelerated platform consists of 2 accelerators running in active-standby mode and 2 directors in active-active mode.

Pre Requisites:

1. Backup of the configuration
2. Image CD of 4.1.3_R55
3. Minimum downtime of 3 Hours.

Procedure:

1.Remove the standby accelerator (Accelerator-2) and the connected director (Director-2) from the cluster.
2.Upgrade the image in Director-2 by installing the image 4.1.3_R55.
3.Once the installation is completed successfully, connect the accelerator-2 to the director-2 and power it on. When the accelerator comes up and detected by the director, the director will upgrade the image of the accelerator automatically.
4.Physically disconnect the active accelerator & active director ie Accelerator-1 director-1 (Actual Downtime starts).
5.Connect the Accelerator-1 to the already upgraded Accelerator-2 over Inter Accelerator Port.
6.Director-2 connected to the Accelerator-1 will now detect the Accelerator-1 also and upgrades it automatically.
7.Now Accelerator-2, Director-2 and Accelerator-1 are upgraded to 4.1.3_R55
8.Install the new image 4.1.3_R55 in Director-1. After up gradation, connect the Director-1 to the Accelerator-1 and join it to the cluster.
9.Restore the configuration in the cluster.
10.Connect the cables from the zones to the respective ports on the accelerators.
11.Check the connectivity to all the zones and to the management server from both the directors.
12.Reset the SIC in both the firewall objects (Director-1, Director-2) in the Checkpoint Management Server.
13.Re Establish the SIC with new activation key.
14.Install the policy on the Cluster.
15.Check all the traffic and the applications status.

Nokia firewall high cpu utilization problem how to identify?

How to identify the source of high CPU utilization on Nokia IPSO
Nokia Solution:


Symptoms
This resolution will explain how to identify issues causing high CPU usage on an IPSO Platform.
Please note that the traffic levels which any particular platform can pass, have been tested in the Nokia QA labs and these statistics are available on request from Sales. The amount of traffic vs CPU utilization is completely dependent on the customer's traffic levels and types, and the configuration of the particular system. A badly configured system, or a badly configured network (routing loops, asymmetric routing), will result in high CPU utilization for what may appear to be low traffic levels compared to our documentation and tests. Nokia cannot qualify performance for a given system under those conditions.
Answer
When a customer reports high CPU usage on an IPSO platform, collect the following files:

1) CST / CPINFO from Management and Modules (KB1355367, KB1610053, KB1355368)
2) fw monitor during the high CPU state. (KB1354670, KB1710226)
3) tcpdump from all interfaces during the high CPU state. (KB1354632)
4) Output of “netstat 1” during the high CPU state.
5) Run the attached script file for 15 minutes during high CPU state.

Note – running the script file will cause the CPU to spike.

Below you will find the purpose for collecting the data files. The data file collected will help you analyze and identify the causes of high CPU utilization.

1) CST / CPINFO from Management and Modules.

Information about CST can be found in Resolution KB1355367, KB1610053, and information about cpinfo can be found in KB1355368.

The following is what you would review to determine what might be causing high CPU:

Look at the CST in the file ‘cpustats.htm” to determine if the CPU usage is occurring in System, Userland or Interrupt

a) High CPU usage in “System” may indicate both Check Point and IPSO Kernels (traffic being inspected by Check Point including SmartDefense) are consuming CPU causing high usage.
SecureXL is the security performance architecture of Check Point VPN-1/FireWall-1 and Nokia security appliances. The architecture offloads many intensive security operations to optimized Nokia IPSO code running on Intel x86 hardware or on network processor hardware. Offloaded security operations include TCP state negotiation, packet forwarding, Network Address Translation, VPN cryptography, anti-spoofing, routing, and accounting. Optimized IPSO code placed at the hardware interrupt level or in a network processor reduces the overhead involved in performing these security operations.
SecureXL is covered in more details in section c) - Application level CPU consumption - but there are a few tweaks which operate purely at an OS level perspective. The following 3 sections should be used selectively depending on the traffic profile of the system.
nokia[admin]# ipsctl -w net:sxl:dns_fastexpire 1
nokia[admin]# ipsctl -w net:sxl:tftp_fastexpire 1
nokia[admin]# ipsctl -w net:sxl:dhcp_fastexpire 1
For example, if the system is handling many DNS connections, the dns_fastexpire option can be set to selectively expire DNS connections, on a more frequent basis than for normal connections. DNS is short-lived UDP, and will not suffer from being cleared from the SXL connections table sooner than a long-lived HTTP transaction.
You should also look for too many cloned routes in the ipsoinfo included in a CST. This was a known bug in certain versions of IPSO (PR 58968) but no longer affects recent builds. If the client is not running the latest IPSO you should consult the PR or release notes to see when it was fixed in the client's IPSO branch.

b) High CPU usage in “Interrupt” indicates load caused by traffic on appliance

Every x86 based platform has an Interrupt Controller. The purpose of the interrupt controller is to mediate between external hardware and the x86 CPU. When external hardware reacts to an event it can call for an interrupt. When an interrupt occurs, the CPU “interrupts” the currently running program(s).

If a system is experiencing performance problems due to a high interrupt rate, the impact of these interrupts on the system can be mitigated under certain circumstances.
A network interface card triggers an interrupt under one of 2 conditions, whichever happens first.
• The receive ring FIFO buffer (also known as a rx_ring) has received a certain amount of data
o The default number of packets queued in the rx_buffer before triggering an interrupt is (rx_ring size / 4).
• A timeout has run out, flushing data out of the receive ring after a set time
o The receive timer is in units of CPU speed / 768. (For processors around 800 MHz, this is close to 1 microsecond.) The default receive timer is 128 units.

When an interrupt is triggered, the operating system must determine which NIC port triggered the interrupt. If only one port claims an interrupt, it is easy for the OS to determine where the interrupt comes from. If many devices - including non-NIC port devices - share an interrupt, the OS needs to check all sources. This is much slower than checking only the one device which is known to use that interrupt.


For example, consider the output below that was extracted from dmesg, we see wx2 and wx5 share the same interrupt (irq 10).
wx5 rev 3 int b irq 10 slot 2 port 2
wx2 rev 3 int c irq 10 slot 1 port 3
If both network interfaces are under load, the impact they make on the system can be reduced by reconfiguring the traffic profiles on the appliance so that highly utilized ports get their own interrupt queue, or share one with a low-traffic link.

The command ipsctl -i kern:intr can be used to determine the number of times an interrupt has been called. Consider the following output, which was collected using the command ipsctl -i kern:intr
ipsctl –i kern:intr Jan 08 16:51:56

count_00 929877 100
count_01 0 0
count_02 0 0
count_03 0 0
count_04 0 0
count_05 34417584 4104
count_06 1 0
count_07 3 0
count_08 9521470 1029
count_09 121 0
count_10 5489037537 0
count_11 18113 2
count_12 37305512 4470
count_13 0 0
count_14 7 0
count_15 22441 0
In this instance, the value of count_10 is very large which signifies either of the interfaces on IRQ 10 could be causing the high level of interrupts. The Voyager page System -> Monitor -> Reports -> Interface Throughput Report can be used to see how many packets an interface is handling over a set time. The best option here would be to run a report on each interface, for packet throughput. Interrupt levels depend on packet throughput, not byte throughput.

If it can be determined that both interfaces on IRQ 10 have high levels of packet throughput, one may decide to move one or both of the logical interface's IP configuration to a logical interface residing on another physical port, which does not share an IRQ with another device. This would effectively address high levels of traffic, which are impacting the processing of traffic due to CPU load. The high levels of traffic would still exist, but the system would be better optimized to handle it. Further reduction in network-associated interrupts can be achieved by enabling flow control and autoadvertise on the IPSO side interface, and enabling "flow control desirable in" on the switch port connected to the IPSO interface, and if available by increasing the ring descriptor size - this depends on the NIC and the version of IPSO. Increasing the link speed from 10mb to 100mb, or 100mb to 1gb, and possibly using link aggregation, are steps which should have been taken first before radically redesigning the system's logical interface layout. Any consideration to move the logical interfaces around based on IRQ, should only be undertaken if it's been determined that the system's actual throughput approaches the QA numbers for max throughput on that particular hardware platform.
One of our clients noted that he experienced some minor (1%) packet loss when using a copper interface, but when he switched to a fibre interface the problem went away. The fibre specification demands that flow control always be enabled, while the flow control setting may be optional for copper. This was transparent to the user in fibre, because there is no option to disable it. Using flow control can diminish problems with packet loss.

Any device which has an interrupt can dramatically impact system performance; by using the above as a template, one can determine where exactly the problem is happening.

Here is another example, this output is also from dmesg
sio0 at 0x3f8-0x3ff irq 4 on isa
sio0: type 16550A
sio1 at 0x2f8-0x2ff irq 3 on isa
sio1: type 16550A
These 2 devices map to the console and aux ports - this is easy to tell because of the 16550A string, which is a UART serial (http://en.wikipedia.org/wiki/UART) controller. If the system is piping debug messages to console, and the console speed is 9600 bits/second, but the output to the console is generating data faster than 9600 bits/second, the system will buffer the output and the irq will constantly be polled and data pulled out of it, causing the interrupt levels to spike. This would also have a detrimental effect on system operations. The solution would be to not log to the console, but to a logfile instead.

Here is one last example based on output from dmesg
wdc0 at 0x1f0-0x1f7 irq 14 on isa
wd0: 1024MB (2001888 sectors), LBA geometry: 993 cyls, 32 heads, 63 S/T
wd0: Physical geometry: 1986 cyls, 16 heads, 63 S/T
This output is for a hard drive on the system. Swap space is stored on the hard drive. For a system which has a high memory usage, it's possible that swap will be used constantly. If memory is copied into and out of swap on a constant basis (thrashing), the interrupts for the hard drive controller will be very high. The output shows that the IRQ for the hard drive controller is IRQ 14. Correlate "ipsctl -i kern:intr" with the IRQ to see if indeed the heavy use of swap is bogging down the system's CPU utilization.
If the device which is causing some slowdown, is suspected to be the console, there are steps which can be taken to mitigate this problem. The symptom may have been that policy pushes freeze up the system for a few moments.
To enable debugging (which will write an event to the messages file and console upon a critical device failure) run the following commands:
nokia[admin]# ipsctl -w net:log:partner:status:debug 1

That will log to the console and to /var/log/messages. If you want to turn off the console output, set:

nokia[admin]# ipsctl -w net:log:sink:console 0
KB1350796 has more information about the solution.

Also note that each network card driver communicates with a MAC-level chip which handles layer 2 communications, one for each network port. This chip has a small register containing a table which is populated by the NIC driver, and it contains a list of local MAC addresses on that port. The chip will then only pass on layer 2 frames which are destined for a local MAC address. If you have more than 16 mac addresses (multicast + unicast) the port will drop into promiscuous mode, and pass all incoming frames to the system kernel. The kernel then has to perform a filtering function to drop traffic which is not locally accessible. In this case, the interrupt rate for an interface would be quite high, due to the massive volume of data which is no longer hardware-filtered. in VRRP, changing to VMAC mode means that all the virtual IPs are bound to one MAC address. This could help mitigate the performance issues due to high interrupts.
Finally, a large number of multicast feeds would have the same effect due to multicast IP traffic being bound to a unique multicast MAC address for each group. Therefore, using multicast on very high throughput networks may not be recommended in certain environments.
You can confirm that an interface is in a promiscuous state by checking the output of ifconfig (this is also captured in the ipsoinfo portion of a CST)
phys eth-s2p1 flags=41f3
ether 0:a0:8e:42:59:80 speed 1000M full duplex

c) High CPU usage in “Userland” indicates some daemon process are consuming high CPU. The userland CPU usage can be examined by using the command 'ps -auxw'.

If you are not sure what a particular CheckPoint or Nokia application is, try searching Nokia's knowledge base at support.nokia.com, or Check Point's knowledge base at http://secureknowledge.checkpoint.com. As a last resort, Google may be of use as well. Nokia resolution 1355505 contains more information about the ps command.

If one of the Check Point processes is using a large amount of CPU, determining the traffic profile can help to identify if re-ordering the rules, turning off logging on some rules, turning off some SmartDefense settings, or checking the SecureXL settings can help reduce the consumption of CPU resources. You can identify Check Point processes from ps -auxw output, below is sample output which shows some common Check Point applications running:
nokia[admin]# ps -auxw
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
root 771 0.0 0.0 472 336 ?? I 26Nov07 0:00.01 /bin/csh -fb /opt/CPsuite-R65/svn/bin/cprid_wd
root 776 0.0 0.3 4956 7032 ?? I 26Nov07 0:00.23 /opt/CPsuite-R65/svn/bin/cprid
root 998 0.0 0.1 240 1096 ?? Is 26Nov07 0:00.01 /opt/CPsuite-R65/bin/ifwd
root 23227 0.0 0.1 1056 2516 ?? Is 17Dec07 0:27.93 /opt/CPsuite-R65/svn/bin/cpwd
root 23248 0.0 1.5 25040 31416 ?? Ss 17Dec07 3:26.77 cpd
root 23402 0.0 1.7 32852 36224 ?? Ss 17Dec07 5:35.96 fwd (fw)
root 23508 0.0 1.3 19008 27560 ?? I 17Dec07 0:08.91 in.asessiond 0 (fwssd)
root 23509 0.0 1.3 18896 27436 ?? I 17Dec07 0:10.40 in.aufpd 0 (fwssd)
root 23516 0.0 0.4 1720 8964 ?? S 17Dec07 11:10.84 dtlsd 0 (dtls)
nokia[admin]#

"ps -auxwl" may also be useful and lists some additional information.
If you have identified a Check Point process as utilizing high amount of CPU, and searching the knowledge bases does not show any relevant answers, you may need to optimize the system based on traffic profiles.

Determining the traffic profile can be done with one of several tools:
• A span port on a directly-connected switch
o See http://www.cisco.com/warp/public/473/41.html for a Cisco-specific document about SPAN ports
o See http://en.wikipedia.org/wiki/Port_mirroring for a Wikipedia page on Port Mirroring
o A span port will only pick up traffic that is flowing through the local switch, if the IPSO platform's traffic is flowing through another switch as well, you will likewise need a span port on that switch as well.
o A PC running Ethereal or Wireshark will need to be capturing traffic on the span port
• tcpdump can be used to capture traffic on one, or more, interfaces on the IPSO platform
o See resolution 1354632 for more information about how to use tcpdump
o Use the -w flag to write to a file, and use the -s 1500 flag to see all packet contents instead of truncating
o Use the -i flag to specify interfaces which you would like to monitor
• fw monitor can also capture traffic
o But it cannot be run against a specific interface
o And it may not catch all of a session's contents which are accelerated with SecureXL
Before running the traffic capture, the first step would be to try and determine which interfaces are the most heavily used. This will let you narrow your captures to the interfaces which are seeing the highest amounts of traffic. If this is not already known, you have 2 options:
• Use netstat -ni to determine load on a per-link basis, or
• Voyager's Monitor pages can be used to help see which interfaces are being used the most
o Go to Voyager > Monitor > Reports > Interface Throughput report and run reports on both byte throughput and packet throughput.
Also, one can look at 'netstat 1' to gather throughput stats for the complete firewall on a per second basis. Running this in peak traffic period will show total throughput of the firewall. This data can be used to see if the firewall is performing as expected (as published performance data).
Once the most heavily used interfaces have been determined, and you have captured a traffic profile, you will need to run an analysis on the data. Both fw monitor and tcpdump capture files can be analysed using Wireshark or Ethereal. A statistical analysis can be run by either of these applications, under Statistics -> Summary and Statistics -> Protocol Hierarchy.

Rule base processing is a top down process and more CPU expense will be incurred matching rules that are towards the bottom of the rule base than for those towards the top of the rule base. Once the traffic profile is known, you may be able to reorder rules so that the most heavily used rules, are nearer to the top of the rulebase to make rule matching more CPU efficient. When a packet is matched against a rule, no further processing of the packet in the rulebase is needed so if a heavily used rule is at the bottom of the rulebase it unnecessarily creates more workload.

SecureXL is a Nokia performance feature that reduces CPU load when it is tuned for the appliances' rule base. (To learn more about SecureXL, please refer to KB1355134.) The output of "fwaccel stat" and “fwaccel stats” can be used to determine if SecureXL is running and if the rules are in the optimum order for SecureXL to function most efficiently. It may be necessary to re-order rules in the rule base to achieve optimum SecureXL performance through an iterative process of moving rules and checking the output of "fwaccel stat" and “fwaccel stats”. On an optimally tuned system, all rules will be accelerated; otherwise, try to ensure the rules which are most utilized, are at the top of the rulebase. Any rules which disable further processing of the rulebase by SecureXL should be placed as close to the bottom of the rulebase as possible.

Logging of rules is discouraged where there is no business need, especially for Accept rules. It is not considered best practices to log Accept rules. Logging of implied rules is also discouraged (Smartdashboard -> Policy -> Global Properties -> Firewall -> Log Implied Rules). Use of Alert in the Track field of rules is strongly discouraged where there is no clear immediate business need. Alerting can create massive load increases on the system. The last option in the Tracking Field, Accounting, can also have an impact on performance.

SmartDefense is a functionality of Check Point that inspects traffic from Layer 3 and above. Check Point uses SmartDefense to determine whether or not a packet flow contains a known attack but this incurs some expense of CPU resources. Therefore, SmartDefense protections should not be enabled if they are not needed. For example, if H.323 traffic is blocked by the enforcement point by a security policy rule then there is no need to use that SmartDefense setting. The system's performance can be optimised by disabling any unnecessary SmartDefense configurations.

d) Output of /var/log/ipsctl.log shows error counters such as “q_drops”, interface errors, etc

For example:
ifphys:eth-s1p1:errors:in_qdrops = 0
ifphys:eth-s1p1:errors:out_qdrops = 0
And
ifphys:eth-s1p1:errors:collision:late = 0
ifphys:eth-s1p1:errors:rx_alignment = 0
ifphys:eth-s1p1:errors:rx_frame_too_long = 0
ifphys:eth-s1p1:errors:rx_internal_mac = 0
ifphys:eth-s1p1:errors:tx_carr_loss = 0
ifphys:eth-s1p1:errors:tx_deferred = 0
ifphys:eth-s1p1:errors:tx_internal_mac = 0
ifphys:eth-s1p1:errors:rx_sqe = 0
ifphys:eth-s1p1:errors:rx_rnglen = 0
ifphys:eth-s1p1:errors:rx_jabber = 0
ifphys:eth-s1p1:errors:rx_frag = 0
ifphys:eth-s1p1:errors:rx_undrsz = 0
ifphys:eth-s1p1:errors:rx_rxerrc = 0
ifphys:eth-s1p1:errors:rx_mpc = 0
ifphys:eth-s1p1:errors:rx_rlec = 0
ifphys:eth-s1p1:errors:rx_cexterr = 0
ifphys:eth-s1p1:errors:rx_drop_pkts = 0

e) High number of logging rules will increase the CPU on the FWD daemon process.
This can be determined by either looking at the cpinfo to see which rules are logging; you can also see how much traffic is being logged, compared to how much traffi is being processed, by looking at the firewalldiag.htm from the CST (the following enforcement point is not logging very much)
Interface table
---------------------------------------
|Name |Dir|Accept|Drop|Reject|Log|
---------------------------------------
|eth-s3p1c0|in | 15113| 3 | 0 | 52|
|eth-s3p1c0|out| 8868 | 0 | 0 | 4 |
|eth-s3p2c0|in | 14077| 0 | 0 | 5 |
|eth-s3p2c0|out| 10041| 0 | 0 | 2 |
|eth-s1p1c0|in | 50 | 0 | 0 | 0 |
|eth-s1p1c0|out| 0 | 0 | 0 | 0 |
|eth-s3p4c0|in | 13031| 0 | 0 | 3 |
|eth-s3p4c0|out| 7024 | 0 | 0 | 3 |
|eth-s3p3c0|in | 13035| 0 | 0 | 44|
|eth-s3p3c0|out| 7027 | 0 | 0 | 2 |
---------------------------------------
| | | 88266| 3 | 0 |115|
---------------------------------------

f) FloodGate also causes higher CPU in userland process.
Applying traffic shaping and prioritization adds a layer of processing to every traffic flow. Using Floodgate is one method of shaping traffic, IPSO also supports DSCP and native traffic shaping. Using Floodgate disables SecureXL functionality. For busy systems, GTAC does not recommend using both traffic shaping/Floodgate as well as firewall services; this would best be done on separate routing platforms.

g) IP1220, IP260 and IP2250
If management (on-board) ports are used to pass through traffic (as opposed to Management traffic) this will also cause the CPU usage to be high due to through traffic using the main CPU as opposed to the network processor. If traffic is spanning ADP cards, the benefit of offloading packet processing to the ADP card is mostly lost, as the traffic now has to flow from NIC -> ADP -> CPU -> ADP -> NIC instead of NIC -> ADP -> NIC. Not all platforms have the management ports as sharing a bus with the CPU, but the product documentation does refer to the on-board ports as management ports, by which the client may properly infer that they are not intended for high traffic networks. Add-on NICs which may be added to all systems, are high end cards which offload most of the packet processing work from the CPU.

2) fw monitor (KB1354670, KB1710226) and 3) tcpdump (KB1354632)

“fw monitor” shows all packets inspected. All packets being inspected could be an indication of traffic NOT being flowed. tcpdump shows all packets received or sent on an interface or system-wide. Both tools are important because they allow you to validate the type of traffic and traffic patterns.

Traffic mix is important in knowing high CPU. Short lived connections, such as HTTP or HTTPS can cause high connections per seconds (cps) and are also small packet size traffic. Note - all packets (independent of size) take the same amount of system resources to process. Therefore, smaller packet sizes yield lower total throughput and in large numbers can contribute to a high CPU scenario. You can have the client gather a capture file with fw monitor or tcpdump, and use Ethereal or Wireshark to analyze the traffic patterns (number of packets captured, average bytes/sec, average packet size, and many other metrics).

4) netstat

Running “ netstat 1” will give the overall throughput of the appliance and also indicate the average packet size seen by the appliance. This can be used to determine if the appliance is at maximum capacity.

Example below is an output of netstat 1

ip390[admin]# netstat 1
input (Total) output
packets errs bytes packets errs bytes colls
8 0 734 4 0 970 0
1 0 64 5 0 498 0
1 0 64 5 0 498 0
^C
ip390[admin]#

Using this information one can determine Total (Appliance) Throughput, average packet size and number of packets per second. Calculations for determining these are provided below:

Total Throughput:
[bytes] * 8 = bits per second
This needs to be done on each interface then added together to represent Total Throughput per second.
432 * 8 = 3456 bytes Input
72 * 8 = 576 bytes Output
3456 + 576 = 4032 bits per second

Average Packet Size:
[bytes] / [packets] = Average pkt size.
(Note: this should be done on several pkts before stating that this is the average)
(Note: this information can also be calculated from a capture file taken with fw monitor or tcpdump which is analyzed by Ethereal or Wireshark)

Packets Per Second:
[input packets]+[output packets] = Packets in this second
To obtain the average you will need to perform the above calculation over an extended period of time.
(Note: this information can also be calculated from a capture file taken with fw monitor or tcpdump which is analyzed by Ethereal or Wireshark)

Collisions in the above netstat output mean that at least one side of an interface does not have full duplex configured. A collision is recorded when the local side received data while it was trying to transmit data.

5) Data Collection Script

This script gives us more data during high CPU utilization. As mention, this will cause the CPU to spike and PLS would recommend running this for 15 minutes during the peak CPU utilization.

The following information can be deduced from the script file.

Flows - how many packets are being offloaded to Check Point for inspection? More packets being offloaded will cause high CPU utilization.

ipsctl – output will indicate ifphys (inq_drops, outq_drops vs. total pkts.)
vmstat - CPU at time of script run
ps -auwx - Userland state at time of script run

Please capture the output of the following script to a file.
#!/bin/sh

# Header
clear
echo "This script will gather statistics on the firewall, and it will some time to complete,"
echo "however, no changes will be made to the firewall. Please do not interrupt."
echo ""
sleep 10

# Begin basic information.

uname_o=`uname -a`
echo "System uname: $uname_o"
echo ""
uptime_o=`uptime`
echo "Uptime is $uptime_o"
echo ""
date_o=`date`
echo "Script started $date_o"

# Grep for in errors, out errors, in qdrops and out qdrops.

echo ""
echo "Iterate in errors"
nice +20 ipsctl -a | grep "errors:in "
echo ""
echo "Iterate out errors"
nice +20 ipsctl -a | grep "errors:out "
echo ""
echo "Iterate in qdrops"
nice +20 ipsctl -a | grep in_qdrops
echo ""
echo "Iterate out qdrops"
nice +20 ipsctl -a | grep out_qdrops
echo ""
echo "netstat -ni"
nice +20 netstat -ni
echo ""
echo "**********************************************************"
echo ""

# Start a while loop

i=0
while [ $i -lt 300 ]
do
date_o=`date`
echo "Run $i started at $date_o"
echo ""
echo "FLOWS INFORMATION:"
nice +20 ipsctl -a net:ip:flow:table:stats net:ip:flow:stats
echo ""
echo "vmstat 1 2:"
vmstat 1 2
echo ""
echo "Running Process List"
ps -auxwl
sleep 1
echo ""
echo "Run $i complete."
echo "**********************************************************"
echo ""
i=`expr $i + 1`
done

echo "Loop complete"
echo ""

# Grep for in errors, out errors, in qdrops and out qdrops.

echo "Iterate in errors"
nice +20 ipsctl -a | grep "errors:in "
echo ""
echo "Iterate out errors"
nice +20 ipsctl -a | grep "errors:out "
echo ""
echo "Iterate in qdrops"
nice +20 ipsctl -a | grep in_qdrops
echo ""
echo "Iterate out qdrops"
nice +20 ipsctl -a | grep out_qdrops
echo ""
echo "netstat -ni"
nice +20 netstat -ni
echo ""

# Gather one more vmstat

echo "vmstat 1 2:"
vmstat 1 2
echo ""

# Capture a short fw monitor

echo "RUNNING FW MONITOR, PLS. DO NOT INTERRUPT.........."
echo ""
fw monitor -ci 30000 -o fwmoncap.trc
echo ""

# Capture SXL stats

echo "Checking SecureXL"
echo ""
fwaccel stat
echo ""
fwaccel stats
echo ""

# Gather one last vmstat

echo "vmstat 1 30:"
vmstat 1 30

# Wrap up

echo ""
date_o=`date`
echo "Script ended $date_o"

clear

echo "DONE!"
echo "Please provide Technical Support the file fwmoncap.trc and the capture of this test"


Ref - Nokia solution guide.