|
Computer Performance MonitoringThis is my concise reference on analyzing and tuning Microsoft Windows and Unix/Linux machines using counters and other tools for performance profiling testing and tuning
Sound: “Away, away, I tell you. You are overloading the machine!”
| Topics this page:
|
System Analysis Utilities downloads
|
Types of Monitoring
External MonitorsThe amount of resources used by each server in an J2EE or .NET application can be monitored externally by sending requests to Windows performance monitor or commands as a Unix Secure Shell (SSH) session user. Since all monitoring requests orginate from outside the server being monitored, some call this a "black box" approach to testing. This is the approach used by Mercury SiteScope, LoadRunner and Business Process Monitor. Just because a monitor is "agent-less" doesn't mean that it is "non-intrusive" in that it imposes overhead on the system being tested. Probes/Agents Within ServersThis approach is classified as "white box" testing because probes are installed inside each app server under test (SUT).
After a probe is "instrumented" to recognize methods running on its app server, it makes an announcement whenever it detects invocations of servlets, JSPs, EJBs, JNDI, JDBC, JMS, and Struts. Probes report on JVM memory heap usage and the memory consumption of Java collections. |
|
Microsoft Windows System Monitors
| Measuring .NET Application Performance - Chapter 15 of Microsoft's Improving .NET Application Performance and Scalability series. May 2004 "Flapping" occurs when a monitored resource quickly alternates between states. |
MS Windows Task ManagerThere are several ways to invoke the Task Manager:
This sample screen shows both green User mode and red kernel mode CPU Usage because View, Show Kernel Times has been selected:
By default, the update speed is set to “Normal”, which means once per second. I prefer the "Low" setting when I see what is hogging up CPU cycles (by clicking the "CPU" heading):
Metrics shown in the Processes tab can be selected from View, Select Columns: The above are defaults. Session ID and User Name are new since Windows XP. |
Windows Performance Secrets by Van Name, Mark L.; Catchings, Bill.; Butner, Richard. Que, 1998. Save as WebpageTo see the Performance counter display screen in an HTML page: run perfmon.exe, add some counters, then right-click over the graphic and select "Save As..." to save it as type “Web Page (*.htm)”.The graphic displayed in the resulting page is dynamically generated by an ActiveX control (the OBJECT CLASSID refers to sysmon.ocx in %WINDIR%\system32\) on Windows 200X machines previously named Performance Monitor (Perfmon.exe) in Windows NT 4.0.
|
MS Windows System MonitorPerformance data is displayed immediately using the System Monitor node in the Performance console.System Monitor is an ActiveX control, so can be displayed in applications that support this object type. For example, System Monitor can be displayed in Internet Explorer or in a Microsoft Office application, such as Microsoft Word. The simplest way to export the OLE Custom eXtension (OCX) is to select a Counter log and on the Action menu, click Save Settings As. The file is then saved as a standard .html file and viewed in Internet Explorer. System Monitor is fully functional whether it is running as part of the Performance console or in another application. View data in System Monitor in one of three views: chart, histogram, or report. The chart and histogram views are typically used for analyzing real-time data. The report view is ideal for viewing a summary of data collected in counter logs. You can use all views to see real-time and logged data. typeperf.exe (a variant of perfmin) that comes with Win2k3 dumps perf data to CSV, TSV or a database. Ganglia's monitoring architecture takes less resources. Configuring Remote Windows Machine Monitoring by User AccountsAccording to Q158438: If you are not using an account with no administror privileges to an NT/Win 2000 machine being monitored, you must first grant read permission to certain files and registry entries. The required steps are:
| A system is described as "quiescent" (dormant; in a state of tranquil repose; at rest; resting; still; inactive; quiet;) when its CPU is running no active user tasks. A system is described as "pegged" when its CPU utilization remains at or near 100% -- the maximum. |
Unix Systems Monitoring
This section discusses performance monitoring tools among variants of the UNIX operating system which consists of: The kernel, the shell, and the file system.
Nagios is a popular open-source performance monitoring.
Site ScopeMercury Site Scope monitors almost all (50) aspects of running Windows and UNIX systems (networks, servers, services, datatabases, applications, etc.) Colorado based Freshwater Software charged $2,495 for it before (on April, 2003) Mercury bought it and now sells it for $60 per unit (the number of servers multiplied by the number of counters monitored on each machine).So Sitescope monitors can be viewed among other LoadRunner Controller System Resource Graphs:
SiteScope is called an "agent-less" technology because it sends native UNIX commands (defined in \SiteScope\templates.os). Sitescope exerts about a 10% overhead on servers responding to remote queries. However, it duplicates the same requests made by LoadRunner's UNIX monitors (hitting servers with twice as much monitoring traffic). Perhaps for this reason the SiteScope monitor has a default measurement update rate of once every 10 minutes. Unless you change this default to 15 seconds (the most frequent allowed), you won't see measurements in Controller graphs.
Linux Status Commands |
|
Process stats | DOWNLOAD: Microsoft's Process Monitor(ProcMon.exe) (v2.8 released by Mark Russinovich and Bryce Cogswell Nov. 2009). It captures in real-time and combines in one GUI every file system, Registry, and process/thread activity, including those of low-level programs (such as lsass, svchost, etc.) and background apps such as anti-virus. Unlike the legacy Sysinternals utilities Filemon and Regmon it replaces, it provides rich and non-destructive filtering, simultaneous logging to a file, and comprehensive session IDs and user names event properties. It highlights the issues with system operations, such as "BUFFER OVERFLOW", "BUFFER TOO SMALL", "FAST IO DISABLED", "NAME NOT FOUND", "FILE LOCKED WITH ONLY READERS". Click on an activity for its full thread stacks, with integrated symbol support for each operation, Shell script commands, pipes and other commands. The Most Executed Code in Solaris ... the CPU Idle Loop by Bill Holler |
ps [-a] [-A] [-c] [-d] [-e] [-f] [-j] [-l] [-L] [-P] [-y] [ -g grplist ] [ -n namelist ] [-o format ] [ -p proclist ] [ -s sidlist ] [ -t term] [ -u uidlist ] [ -U uidlist ] [ -G gidlist ]
-a | List information about all processes most frequently requested: all those except process group leaders and processes not associated with a terminal. |
-A | List information for all processes. Identical to -e, below. |
-c | Print information in a format that reflects scheduler properties as described in priocntl. The -c option affects the output of the -f and -l options, as described below. |
-d | List information about all processes except session leaders. |
-e | List information about every process now running. |
-f | Generate a full listing. |
-j | Print session ID and process group ID. |
-l | Generate a long listing. |
-L | Print information about each light weight process (lwp) in each selected process. |
-P | Print the number of the processor to which the process or lwp is bound, if any, under an additional column header, PSR. |
-y | Under a long listing (-l), omit the obsolete F and ADDR columns and include an RSS column to report the resident set size of the process. Under the -y option, both RSS and SZ will be reported in units of kilobytes instead of pages. |
-g grplist | List only process data whose group leader's ID number(s) appears in grplist. (A group leader is a process whose process ID number is identical to its process group ID number.) |
-n namelist | Specify the name of an alternative system namelist file in place of the default. This option is accepted for compatibility, but is ignored. |
-o format | Print information according to the format specification given in format. This is fully described in DISPLAY FORMATS. Multiple -o options can be specified; the format specification will be interpreted as the space-character-separated concatenation of all the format option-arguments. |
-p proclist | List only process data whose process ID numbers are given in proclist. |
-s sidlist | List information on all session leaders whose IDs appear in sidlist. |
-t term | List only process data associated with term. Terminal identifiers are specified as a device file name, and an identifier. For example, term/a, or pts/0. |
-u uidlist | List only process data whose effective user ID number or login name is given in uidlist. In the listing, the numerical user ID will be printed unless you give the -f option, which prints the login name. |
-U uidlist | List information for processes whose real user ID numbers or login names are given in uidlist. The uidlist must be a single argument in the form of a blank- or comma-separated list. |
-G gidlist | List information for processes whose real group ID numbers are given in gidlist. The gidlist must be a single argument in the form of a blank- or comma-separated list. |
rpc.rstatd statsTo obtain statistics from each UNIX machine (through port 111), use rpc.rstatd subserver daemon invoked by the inetd subsystem controlled by /etc/inet/inetd.conf
Rstatd vs. SAR
These statistics are queried and displayed using the perfmeter OpenWindows XView utility |
Other Monitors
|
Measured Objects |
W: prefixes Windows Perfmon counters. S: prefixes SiteScope counters. R: prefixes Linux/Solaris SAR counters.
Counters.chm file from the Windows 2000 Resource Kit.
|
|
SAR -y is Terminal ActivityThis is not also available in SiteScope.netstatThis Solaris utility normalizes values to per-interval rates of the sampling interval specified on the command line
Picking A Network MonitorTo determine if the number of network buffers need to be set higher, watch the number of error-free packets the system is dropping obtained by subtracting from the total packet throughput the sum of Packets Outbound Discarded and Packets Received Discarded.Counters Packets Outbound Errors and Packets Received Errors indicate network card hardware problems. Don't rely on the Current Bandwidth counter because it shows theoretical rather than actual bandwidth. A reasonable limit for an Ethernet network is %Network Use less than 30 percent. A higher value means you need to speed up the network or reduce the amount of traffic. Use the %Broadcast Frames and %Multicast Frames counters to view the percentages of broadcast and multicast traffic. Network cards pass broadcast and multicast frames to a higher-level software component before they act on or discard them. This extra activity results in additional CPU use. As the requesting computer connects to find the server computer's network address, it generates broadcast traffic. Frame traffic increases as the server transfers the files. Similarly, don't use the Output Queue Length counter because it's always zero, since transmission requests are not handled by the network card but by network device interface specification (NDIS) software. |
$1099 Denika from Somix plugs into the MRTG to dispaly 500 different performance trend reports after saving SNMP and IpSwitch WhatsUp Gold logs to a MySQL ODBC database (as a Microsoft service). |
Network MonitorTo add Network Segment counters, you must install the Network Monitor Agent.Microsoft provides two versions of NetMon. Install the "full" promiscuous version of Netmon from Microsoft's Systems Management Server (SMS) 1.2 and 2.0 product to capture packets on all NICs on remote network subnets. Otherwise, the network card typically rejects network traffic intended for other network cards.
Netmon filter specifications are stored in the NetMon\Captures subdirectory. Netmon allows filtering by protocol, TCP/IP address, and data pattern. This activity drains the resources of the computer you're analyzing, limit Network Segment monitoring. Monitoring Network Segment counters increases CPU use. As these counters process network traffic, they use additional system resources. |
Open Source Ethereal displays and filters network data files captured by sniffers, even while a capture session is in progress. It resolves DNS. Show Traffic, rewrite of Linux Trafshow. $450 Network Probe from ObjectPlanet. Network Sniffers |
Is there enough swap space?This is the usual fix for "out of memory" messages.
How much memory is each process really using (is Private)?
To determine the "Private" bytes that is NOT shared with other processes (excluding application binaries),
sar -r [unused memory pages and disk blocks]:
freeswap = number of 512-byte disk blocks available for page process swapping. vswap = virtual pages available to user processes [not in solaris Sar]. sar -b [Buffer activity]:
lread/s, lwrit/s = average number of logical blocks transferred from the buffer cache (system buffers to user memory); wcncl/s = pending writes in system buffers cancelled [not in SiteScope] %rcach, %wcach = Fraction of logical reads that are found in the buffer cache (100% minus the ratio of bread/s to lread/s) (cache hit ratios, that is, (1-bread/lread) as a percentage); pread/s, pwrit/s = Average number of physical read/write requests, per second that use character device interfaces (basic block transfers via raw (physical) device mechanism.) The most important entries are the cache hit ratios %rcache and %wcache, which measure the effectiveness of system buffering. If %rcache falls below 90 percent, or if %wcache falls below 65 percent, it might be possible to improve performance by increasing the buffer space. sar -h [system heap statistics, not available in SiteScope]:
overhd = block managed arena overhead; unused = block managed arena memory available for allocation; alloc/s = number of allocation requests per second; free/s = number of free requests per second.
sar -p commands obtain paging activities stats from UNIX systems.
cache/s = address translation fault page reclaimed from page cache pgswp/s = address translation fault page reclaimed from swap space pgfil/s = address translation fault page reclaimed from filesystem rclm/s = pages reclaimed by paging daemon
steal/s = protection fault on unshared writable page sar -k [kernel memory allocation (bytes)]
|
ServerW: MS IIS 6 Counters of the WWW Service, its Web Service Cache, FTP Service, Internet Information Services Globals, SNMP, Active Server Pages, ASP.NET.
|
Solaris vmstat presents memory, run-queue, and summarized processor utilization. It uses kstat which maintains CPU utilization for each CPU. Solaris mpstat presents per-processor stats and utilization. Solaris |
Solaris ps presents per-process stats. Solaris prstat presents thread-level microstate accounting (with high-resolution time stamps) and per-project stats for resource management. sar -v [entries/size for each table, evaluated once at sampling point, not available in SiteScope]: proc-sz = number of process entries (proc structures) that are currently being used, or allocated in the kernel. inod-sz = total number of inodes in memory versus the maximum number of inodes that are allocated in the kernel. This number is not a strict high water mark. The number can overflow. file-sz = size of the open system file table. The sz is given as 0, since space is allocated dynamically for the file table. ov = overflows that occur between sampling points for each table. The number of shared memory record table entries currently being used or allocated in the kernel. The sz is given as 0 because space is allocated dynamically for the shared memory record table. lock-sz = number of shared memory record table entries currently being used or allocated in the kernel. The sz is given as 0 because space is allocated dynamically for the shared memory record table. sar -c [System calls]:
sread/s swrit/s = read system calls per second. fork/s = write system calls per second. exec/s = exec system calls per second. If exec/s divided by fork/s is greater than three, look for inefficient PATH variables. rchar/s, wchar/s = characters (bytes) transferred by read and write system calls per second. sar -m [Message and semaphore activities (for Interprocess Communication)]:
sema/s = sempahore primitives per second. These figures will usually be zero (0.00), unless you are running applications that use messages or semaphores. sar -t [translation lookaside buffer (TLB) activities, not available in SiteScope]:
faults: address translation not resident in TLB; rflt/s = page reference faults (valid page in memory, but hardware valid bit disabled to emulate hardware reference bit); sync/s = TLBs flushes on all processors; vmwrp/s = syncs caused by clean (with respect to TLB) kernel virtual memory depletion; flush/s = single processor TLB flushes; idwrp/s = flushes because TLB ids have been depleted; idget/s = new TLB ids issued; idprg/s = tlb ids purged from process; vmprg/s = individual TLB entries purged. sar -I [interrupt statistics, not available in SiteScope]:
vmeintr/s = vme interrupts per second; sar -a [File access system routines]:
namei/s = number of file system path searches per second. If namei does not find a directory name in the DNLC, it calls iget to get the inode for either a file or directory. Hence, most igets are the result of DNLC misses. dirbk/s = number of directory block reads issued per second. |
Disk I/O speeds are about 10-100 times slower than memory. Disk I/O speeds will be very fast when data is store on filer disk arrays because such devices usually have a large amount of memory to cache data. A single spindle can generally handle 50 accesses per second.
|
Solaris/UNIX Hard Disk MonitoringOne indication of whether a database server is "I/O-bound" is the Unix vmstat utility utility:vmstat 5 5 kthr memory page faults cpu ----- ----------- ------------------------ ------------ ----------- r b avm fre re pi po fr sr cy in sy cs us sy id wa 0 0 217485 386 0 0 0 4 14 0 202 300 210 14 19 22 45 In the example above, the "wa" (wait) column shows that 45% of the CPU time is being used waiting for database I/O. To fully understand the nature of I/O, you must first understand Oracle's asynchronous writing mechanism. Then you need to look at the SAP applications and explore how SAP tables are populated and managed from within the SAP application. With this knowledge, you can then make intelligent decisions about the proper file placement. iostat |
Windows Hard Disk Monitoring
This table presents diskperf parameters to specify the hard disk performance counters to start when the machine is restarted. Windows XP, however, displays this message:
|
DHCP Audit AlertsBy default, Windows 2000 does NOT create DHCP audit logs. Enable DHCP audit logging by right-clicking the server for Properties, in the Server name Properties, General tab.Use this command to use the DHCP Server Locator Utility from the Windows 2000 Resource Kit to, every 6000 seconds, detect active DHCP servers on the network and send to the email addresses in file A:\Admins.txt the IP addresses of DHCP servers not listed in the file auth_dhcp_ip_list.txt.
|
|
Other Measurements
|
Log File Formats
Counter LogsCounter log data can be stored in four formats, all viewable using System Monitor:Both the Perfmon and binary log file formats are proprietary formats developed by Microsoft. Two types of binary log files exist: circular and linear. The first line of CSV-format and TSV-format log files serves as the header, providing information about the format of the file, the version of the PDH (Microsoft Performance Data Helper) interface used to create the log file, and the names and paths of each of the counters to the PDH. The PDH library can open a log file in the Perfmon format only for reading. Included in all versions of Microsoft Windows XP (excluding Windows XP Home Edition)
relog.exe logfile.blg -f csv -o logfile.csv
logman.exe start Sample_Log
typeperf "Memory\Available Bytes" -s XPPRO -si 00:05
Trace LogsTrace logs generate binary .etl files. System Monitor CANNOT read these files.Use the TRACEDMP.EXE utility from the Windows 2000 Server and Professional Resource Kits to read .etl files to create DUMPFILE.CSV files for viewing by other applications. The utility also creates a SUMMARY.TXT file. |
|
Thresholds Trigger Alerts
Statscout Performance Monitor from Enterprise Management Associates
|
Latencies
|
J2EE Monitoring
So here are the approaches, from the least costly to the most costly: 1. First, find the average end-to-end response time by emulating end-user client exchanges with the web server. Identify the machine and service which consume the most CPU, network bandwidth, and other resources during stress runs which incrementally add users until a server reaches its maximum rate of processing (as measured by the hits/pages processed per second metric). 2. Identify the average response time of key services by emulating calls directly to each service (XML calls to app servers, SQL calls to databases). Watch them during stress runs. 3. Work with developers to add application code which displays key performance information along with user data (like the times that Google displays with each search result). This allows web (HTML) based client scripts to simply obtain the information. 4. Work with developers to add application code which issues transaction-level performance information to a log. Most mature application packages allow administrators to control the verbosity of the application's logs. Ideally, the alerts are formatted to make it easy for logs to be combined with other logs for analysis after the run is finished. If not, logs may need to be run through a custom parser. 5. Formatting alerts to the ARM (Application Resource Management) standard allows the alerts to be issued (using the free API) and collected in real-time by business service management systems in production. See http://www.wilsonmar.com/1perfmon.htm#ARMz This is the best approach, IMHO. 6. Code LoadRunner scripts in Java to emulate the client. This complex approach I describe briefly in http://wilsonmar.com/1lrscript.htm#JavaTech This approach is time-consuming because parts of the client application needs to be rewritten in the test scripts (user authentication, file encryption, client session and cookie management, calls to servers, etc.). Such scripts needs to precisely identify and specifically format calls to services. 7. Install agents inside J2EE servers which sends status to the Mercury Business Availability Center. 8. The new version of WebLogic integrates with LoadRunner to provide performance monitoring that can be turned on or off dynamically on production machines. 10. J2EE monitoring packages include:
http://www.javaperformancetuning.com/tips/j2ee.shtml |
ARM Instrumentation
To measure application availability, application performance, application usage, and end-to-end transaction response time in a vendor-neutral way, the ARM (Application Response Measurement) API (first released June 1996) are created by an Open Source Working Group based on their UMA (Universal Measurement Architecture) which involve ARM API calls received by ARM agents feeding Enterprise Management Applications. The ARM Technical Standard in July 1998 originally defined a set of six library procedure calls for programmers to call from within their source code to initialize the ARM subsystem and define the beginning and end of Business Transactions being measured:
ARM 2.0 SDK dated 11/11/97 is offered in UNIX and Windows flavors, along with sample.c source code for each platform. ARM 2.0 added the ability to correlate parent and child transactions, and to collect other measurements associated with the transactions, such as the number of records processed. This SDK (explained in the User Guide) provides:
ARM 2.01 Patched ARM 2.0 with new arm201.h files.
Business Service Management ApplicationsARM subsystems apps are provided by a number of Business Service Management applications providers.The "Big Four":
HP OpenView Measure Ware glanceplus (via Bill Furnas) (HP has also acquired Mercury's Diagnostics technologies) BMC Patrol products include Performance Assurance, a part of “Business Service Management” for Windows and UNIX. CA (Computer Associates)
ASG (formerly Landmark until February 19, 2002) TMON, NaviPlex for IBM S/390, AS/400, and MQ Series platforms. Sun Emerging vendors:
ZenOSS (Annapolis, MD) offers open-source software Integration vendors:
Martin Haworth at HP Openview shows an alternative mechanism for capturing response time measures with Service Management Using The Application Response Measurement API Without Application Source Code Modification by routing interactions through "dumb" Remote Terminal Emulation (RTE) so that data exchanged can be captured for examination.
|
Log Analysts
Also:
|
Tuning Applications
To move the paging file to another hard disk on your computer running Windows 2000 Professional, in the System Properties dialog box, click the Advanced tab, and then click the Performance Options button.
Quality of ServiceThe Windows 2000 Server has a performance governor: the Admission Control Service. It works in conjunction with the Quality of Service (QoS) standard, which allows network administrators to either restrict or guarantee network performance to specific applications.
|
|
Quiz Questions
|
Websites on Perfmon
|
|
Related Topics:
LoadRunner
SNMP
Rational Robot
Free Training!
Tech Support
| Your first name: Your family name: Your location (city, country): Your Email address: |
Top of Page
Thank you! |