Personal Library

Concept - Performance (valid to 7.10)

Introduction

The performance of an e-commerce system is a major criteria of its market acceptance. End users expect quick response times when visiting a web shop and the owner of the e-commerce application wants to optimize the costs of administrating and running the site. This concept covers the performance and scalability of an Intershop Commerce Management system. This includes coding guidelines as well as considerations how to analyze and tune a running application.

References

Overview

General Considerations

Performance tuning is an overall task of each (more complex) software development process, regardless of the type of development (standard product or customer project). Performance must be considered at every line of code. Consider, that the interface to your implementation, describing the requirements of your code, what is the task, the responsibility of the class and how it was implemented. This will reduce the costs of reworking significantly. Please consider:

how many database calls are necessary?
how big is the result set of a database call?
What is the complexity of my application server side (e.g. filters) vs a complex query?
- may aspects, like domain containment can be resolved, before the query is executed
- may complex filters, which doesn't shrink the result set significantly can be applied after the query execution

Mostly performance optimizations resulting in code (java or sql-queries)

This also allows a detailed performance analysis by switching on multiple performance sensors (which add a processing overhead to the system).

Performance monitoring, tuning and bugfixing is an ongoing development task. The highest priority in the Intershop development is the storefront performance, followed by import performance and backoffice performance. This order of priorities results from the fact, that a low storefront performance has the biggest impact on the business of our clients.

The current development process includes automatic storefront performance tests of the builds of the main branch. All other performance tests (also import and backoffice) are executed on demand.

Top Level View

This simple top-level view shows three aspects that must be considered when tuning an Intershop Commerce Management system, the workload, the software and the hardware.
The workload is the sum of all requests and commands entering the system, for example storefront HTTP requests, back office requests, service calls or job executions. The software includes all Intershop Commerce Management and third-party software components and the operating systems of the used machines as well as all software-related settings. The hardware includes all machine and infrastructure hardware that is involved.
All three aspects might cause bottlenecks and all three can be manipulated to achieve better performance. The load, for example, can be reduced by decreasing the number of clicks that are necessary for each storefront use case.

Tuning Levels

Performance tuning involves a complex set of factors. There are critical issues to deal with at numerous points in a system. Guidelines for performance tuning must therefore address a variety of topics. Performance tuning of Intershop Commerce Management can be divided into the six tuning levels as shown in the picture above. These levels cover different phases of the project. Consequently, different people may handle the tuning tasks at each level. Hardware sizing and hardware extensions are not included in these levels, because these are considered as a separate subject. Extending available hardware cannot compensate for problems introduced at the other tuning levels. If a bad design leads to a substantial slowdown, it might be virtually impossible to achieve better results simply by adding more hardware.

Application architecture and application design have the highest impact. Performance tuning is not only about changing settings - it is mainly about architecture and design. Mistakes made in these "higher" levels cannot be corrected or resolved by changes in a lower level.

Tuning at levels A and B is important for software engineers and Web designers who develop Intershop Commerce Management-based Web sites or Intershop Commerce Management extensions such as cartridges. Tuning within levels C, E, and F should be carried out by a system administrator or other person experienced in configuration of distributed environments. Level D tuning deals with typical Oracle database tuning and must be handled by the database administrator.

The main sections of this document describe tuning at levels A, B, C, E, and F in detail. Database tuning is a broad topic that is not covered in this document.

Application Architecture

The primary goal of optimizing the application architecture is to reduce the workload of different parts of the Intershop Commerce Management system. The following picture shows the most general parts of the system (web layer, application server layer and data storage layer).

The upper layers provide caching capabilities to reduce the load on the lower layers. I.e. the web layer provides a page cache to reduce the load on the application server layer. And the application server layer uses caches (e.g., the ORM cache) to reduce the load on the data storage layer (e.g., the database). It is essential for a good performance and scalability to use the caches appropriately and avoid cache misses.

The Page Cache

The most important measure to achieve optimal performance is to cache as many pages as possible in the Intershop Commerce Management page cache by including the <iscache …> tag into the corresponding ISML templates. It is also advisable to choose the longest acceptable caching time for each template.
Using the page cache is the key to high performance. Typically, the page cache can deliver 10 to 100 times as many pages compared to pages generated dynamically by an Intershop Commerce Management application server. The number of simultaneous sessions that can be served is increased by the same factor.
It is often possible to cache pages even if they contain information that changes from time to time. For example, if product information is changed frequently and changes must be propagated quickly, the product pages can still be cached if the page lifetime is set to an interval that is short enough, for example, a few minutes. There is often an acceptable propagation delay and the cache hit rate is still high for the most frequently accessed products.
In production environments, one Web adapter is usually able to deliver about 100 to 300 pages per second from the page cache. The achieved page rate depends on factors like the page size, the number of different pages, the access patterns, and the used hardware.

The following conditions have to be met to write a response page into the page cache:

a valid <iscache …> tag exists in the ISML template
the HTTP request method is GET or POST
the pipeline XML file does not contain an interaction continue node
the status code of the application server response is HTTP 200 (OK) and the response does not contain application error headers (“X-Error...”)
the page cache directory is accessible and enough disk space is available
page caching is enabled for this site
the pipeline interaction node is set to buffered=true so that the application server response contains a “Content-Length: <n>” HTTP header
the application server response does not contain invalid <wainclude …> tags

A page is delivered from the page cache if the following conditions are met:

there is a file in the page cache that exactly matches the request URL (the session ID is ignored); the components that have to match depend on the Intershop Commerce Management version and can include:
- protocol (HTTP, HTTPS)
- hostname
- port
- path (server group, site, locale, currency, pipeline, start node)
- parameters and values
- PGID
- site status (maintenance, disabled, live)
the file is still valid (the file modification time is greater than the current time)
the HTTP request method is GET or POST
page caching is enabled for this site
the request does not contain cache-control HTTP headers or the acceptance of these headers is switched off in the webadapter.properties
there is no error during opening and reading the file
there is no hash conflict (a hash conflict occurs if two URLs have the same MD5 hash value that leads to the same page access key; the probability for such a conflict is extremely low; hash conflicts are detected and handled by the Web adapter)

Take into account that a different URL parameter order and different URL parameter combinations lead to different page access keys and therefore a multiple of cached pages, for example, if a certain URL parameter is sometimes omitted and sometimes not. Try to reduce the possible number of these URL variations.
Pages or parts of them can often be cached, even if there are some dynamic or session-specific data elements on the page.

Web Layer and Page Cache Analysis Tools

There exist several ways to analyze the web layer and page cache:

Log files: The web adapter logs which pages were delivered, if a page came from the page cache and many other information. The format of the log is described here.
ICI: The ICI provides comprehensive information about the load on an Intershop Commerce Management cluster. It is possible to select time frames for analysis and viewing a bunch of drill-down reports.
SMC: The performance monitoring section of the SMC allows to inspect the requests reaching the application server.

Web Layer and Page Cache Checklist

Check the following items to achieve the best performance results in the web layer:

Cache as much as possible.
Set the cache lifetime to as large a value as possible.
Avoid interaction continue nodes.
Preload the page cache (at least with the most frequently called pages).
Reduce the number of remote includes.
Do not produce huge pages.
Optimize the number of pages a consumer must visit to buy a product.

The Application Server Caches

The application server caches several object instances to reduce the number of database queries and file accesses. Using a JMX tool (e.g., jconsole) these caches can be monitored.

The ORM Cache

All ORM objects which are referenced within the application server are collected in the cache of the ORM engine (except objects with the reference type NONE). The ORM cache is not limited to a number of elements, it is just limited by the VM size. The cache usually holds soft references to the ORM objects, but the reference type (hard, soft, weak, none) can be specified when developing the ORM object. The clearing of soft references follows the implementation rules of a particular Java VM.

The ORM engine defines two types of caches. One cache references existing ORM objects by primary or alternate key. Optionally there can exist miss caches for each ORM object of a configurable size. The miss caches are LRU lists. When many DB misses occur for a specific ORM object a miss cache can significantly increase the system performance.

Application Server Caches Analysis Tools

There exist several tools to analyze the application server caches:

JMX tool: Browse the MBeans of com.intershop.enfinity.CacheInformation to see how many elements are in the cache or how is the cache hit ratio
SMC: The section Monitoring - OR Mapping - ORM Cache shows detailed information about the ORM cache and the database hits and misses

Application Server Caches Check List

Check the following items to verify application server caches are working fine:

Avoid too many DB misses. Configure a miss cache if required.
Use alternate keys if applicable, instead of getting single objects by semantic information using getObjectsBySQLWhere().

Application Design

This section discusses Intershop Commerce Management design and coding guidelines to meet high performance and stability needs. The goal of performance tuning at the design level is to make pipelines and ISML templates as efficient as possible.
The basic principle is to keep it simple. This applies to both, design and coding. There is usually a way to simplify an algorithm, to shorten a code sequence, or to avoid a database access. If a required feature leads to a substantial slowdown, the feature should be reconsidered. Besides this, eliminating only a small part of the functionality can result in much better performance in many cases.

The important thing to remember is that there is always a way to build a feature for optimal performance. If you are not sure which solution is the best performing one, build a small prototype and load test it using multiple parallel requests.

Cluster Configuration

In many cases, it is better to use several smaller machines for the Intershop Commerce Management application servers, instead of one large machine. The advantage is that the Intershop Commerce Management components can be distributed to increase the availability of the cluster in case of a machine failure. But there are also cases, where bigger JVMs increase the performance, e.g., when running imports with mass data.
It is recommended to install the Web adapters, the application servers, and the Oracle database server on separate, dedicated machines to simplify resource monitoring, tuning, troubleshooting, and hardware extensions. It is also strongly recommended to separate Web servers and application servers for performance and security reasons.
Here are some other things to consider:

Use separate page caches, one for each web server machine. This eliminates the overhead of shared file systems.
Use enough RAM in the Web server/Web adapter machines. This enables the operating system to use a sufficient portion of the RAM as file cache to cache the most frequently accessed pages.

Database Server Tuning

See the Database cookbooks. Ask the DBA.

Application Server Settings

The application server settings split into two major areas: JVM settings and application server properties.

JVM Settings

JVM settings are made in tomcat.sh|bat. Here are the most important settings for a Linux system.

tomcat.sh

JAVA_OPTS=$JAVA_OPTS\ -Xms1024m
JAVA_OPTS=$JAVA_OPTS\ -Xmx2048m
JAVA_OPTS=$JAVA_OPTS\ -XX:MaxPermSize=400m
JAVA_OPTS=$JAVA_OPTS\ -XX:NewRatio=8

# garbage collector log (choose one)
#JAVA_OPTS=$JAVA_OPTS\ "-Xloggc:$IS_HOME/log/gc-$SERVER_NAME.log -XX:+PrintGCDetails"
#JAVA_OPTS=$JAVA_OPTS\ "-verbose:gc -XX:+PrintGCDetails"

The initial heap size (-Xms) per default is set to a value sufficient to keep all pre-loaded data for a standard installation. There are cases, where the garbage collector starts later with full GCs when setting the initial size equal to the maximum size. Analyze the GC logs of a test system with different settings to find the optimal setting for the customer's installation.

The maximum heap size (-Xmx) per default is set to a value sufficient to handle the demo data. Dependent on the number of entities this value should be increased. The maximum size also depends on the type of application server. A server doing mass data imports can perform much better having a big heap. For a server handling storefront load this can be counterproductive, since the run-time of full GCs increases. Always check the settings on a test system before going live!

The MaxPermSize defines the space used for loaded classes. If the custom installation contains many more code artifacts this value may need to be increased.

The NewRatio specifies the ratio between new and old generation. Storefront applications require less GC when setting this value to 2. For imports it is better to keep the value of 8, because the old generation will be bigger and can keep more entities.

Use the mentioned GC settings to enable logging. See the JVM documentation how to interpret the log.

Application Server Properties

There are many files with properties controlling the application server behavior. The administration guide describes this in detail. This chapter just names some performance related properties from the files located in share/system/config/cluster and the cartridge property files. Note that there exists the ability to overwrite certain settings, e.g., in a deployment configuration or a server specific configuration.

`appserver.properties`

Consider customizing of the following settings in production systems:

intershop.job.enabled: If using one or more dedicated job servers, set this setting to true only for these servers, false otherwise. Consider also to disable scheduled jobs when running load test to avoid jobs influencing the test results.
*.CheckSource: Set all check source configurations to false for non-development systems.
intershop.cpu.id: Disable this property, i.e. enable all CPUs, if the server runs mass data operations (imports) which support multi-threading. Limiting to a number of CPUs (e.g., 4 out of 8 existing) must be administrated at OS level.
intershop.pipelines|pipelets.PreloadFrom*: Avoid pre-loading of unused cartridges on production systems. This wastes JVM heap and increases the server startup time.
intershop.monitoring.*: In production systems enable sensors for requests only.

Tip

In case your dedicated appserver's instances with WFS/BOS/JOB does not execute the job scheduler as expected, we propose to bind all regular running jobs to the server group "JOB" (see column SERVERGROUP of table JOBCONFIGURATION) and to enable job execution also for the appservers of WFS and BOS. Doing this allows to execute all regular jobs on the separate job server and allows the system to run import jobs triggered by the back office in the BOS server(s).

Cartridge Properties

The configuration of each cartridge is located in the file share/system/config/cartridges/<cartridge name>.properties. This file contains amongst others:

The configuration of miss caches for persistent classes defined in that cartridge.
The configuration of cartridge specific caches, e.g., the domain LRU cache of the core cartridge.

Change the values only if a concrete performance problem was discovered and a different configuration fixes this problem. Otherwise side effects may occur.

OS Tuning

There is a bunch of OS specific settings which can influence the system performance. This chapter names just a couple of them.

The TCP_TIME_WAIT interval specifies how long a socket remains in TCP_TIME_WAIT state after closing a TCP connection. To avoid running out of sockets this setting should not be too high. The maximum number of available sockets is also configurable in most OS within a specific range.

Synchronize the machine clocks! Otherwise situations may occur where pages form the page cache expire too early or sessions expire too early. This decreases the performance.

Network configuration and compatibility issues sometimes cause problems. If the throughput of an Intershop Commerce Management cluster is unexpectedly low and the hardware utilization (CPU, I/O) of all cluster components is also low, but the response times are high, network problems may be the cause. These problems are often hard to track down. There may occur cases, where the load balancer and the server machine are not compatible on networking level. Such a problem can be solved by adding a hardware switch in between. Other typical causes of problems are wrong half duplex/full duplex settings with twisted pair Ethernet connections, wrong routing, or slow firewalls.

Last but not least, any background processing of the OS should not slow down the Intershop Commerce Management system.

Tools

This chapter contains some information about tools for performance analysis. This includes tools for generating load on the system as well as tools to analyze the system. The Cookbook - Performance (valid to 7.10) contains detailed information on how to dealing with these tools.

Load Generators

A production system receives different types of load:

Storefront load: A consumer triggers different storefront actions like browsing or searching products, and finally entering an order process to buy something. This results in several short running HTTP requests.
Back office load: A back office user triggers different actions. These can be short-running, like checking the products assigned to a category (read action) or fixing a typo in a product description (write action). A back office user can also start long running actions, like imports or other mass data updates. A back office user (or SMC user) can also start jobs manually.
Scheduled jobs: These are triggered when jobs are enabled on the system and the starting condition is met. The run-time of a job depends on the implementation and the amount of data to be processed.
Incoming service calls: Usually sent by other systems. Incoming service calls are usually short running. Nevertheless the load from such calls may become significant.

Simulating the load for analyzing long running processes in most cases corresponds to triggering the according action and doing some measurement, e.g., running an import and doing some analysis.

For short running requests the situation is a little bit different.
The simplest storefront load test is a single click. Especially after complex reworks a single click can run more than 10s (while the target value after performance optimization is just a fraction of a second). In that case it is useful to analyze and speed up this single click performance first.
Sometimes it is also useful to aggregate the performance numbers of multiple clicks, e.g., when profiling the application or checking the memory consumption.
Finally, there exist many 3rd party tools to create load on the Intershop Commerce Management system, such as Apache JMeter, Borland Silk Performer, Compuware dynaTrace or HP Loadrunner. The next sections contain some basic information of the tools used at Intershop R&D and QA.

Apache JMeter
Citation from the official web site: "The Apache JMeter desktop application is open source software, a 100% pure Java application designed to load test functional behavior and measure performance. It was originally designed for testing Web Applications but has since expanded to other test functions."
Since JMeter is open source it can be used by any developer to create load on an Intershop Commerce Management system. At the beginning it is sufficient to call a couple of static URLs by a number of parallel users. It is also possible to make the load more dynamic, either by reading input data from a file or parsing the response and building the subsequent request.

Borland Silk Performer
Citation from the official web site: "Silk Performer is an efficient, cost-effective way to ensure your mission-critical applications meet performance expectations and service-level requirements."
Silk Performer is used by QA to run several kinds of storefront load tests against Intershop Commerce Management, such as browsing users, order users, search users etc. Load tests in most cases run a user mix, e.g., 50% search users, 20% A2B users, 10% browsing users, 10% order users and 10% registered users. There exist different scripts to simulate the load. Silk Performer requires a license, that's why the access to this tool is limited.

Analysis Tools

There exist many tools for analyzing Java applications.
The JDK itself for example provides the JConsole and JVisualVM which allow analyzing applications in terms of heap usage, GC activity and other metrics. JConsole can also be used as JMX console to monitor and clear Intershop Commerce Management caches. JVisualVM includes profiler capabilities (it is a "NetBeans Light").
The load generator tools mentioned in the previous chapter also include analysis functionalities, most interesting here are request runtimes, page statistics and so on.
When doing single clicks browser tools or plugins can be used to analyze the response content and separate runtimes. Examples here are the Firebug plugin for Mozilla Firefox or Opera Dragonfly.

The following sections describe the built in analysis tools of Intershop Commerce Management's SMC.

Dump Generation

Under Installation Maintenance - Dump Generation is is possible to create thread dumps and heap dumps. Especially when the heap is completely used and the application runs in heavy garbage collections or even out-of-memory situations, a heap dump can help to analyze the instances in the heap, even offline with an according tool. Just select the application server, press Apply and Create heapdump afterwards.

ORM Cache Monitoring

Under Monitoring - OR Mapping - ORM Cache a table with ORM cache statistics appears:

Here is the description of the columns and some performance related information:

Persistent Object: The class name of the persistent object (without package name).
Cache Count: The number of instances in the cache. A given number of key-value-pairs exists in the ORM cache; the key is an according instance of ORMObjectKey; the value is the instance of ORMObject.
Cache Loaded: The number of cached instances with a loaded state. An ORMObject instance is just a "shell", which references one shared state and multiple transactional states. Most of the instances use a soft reference for the shared state. The GC can drop a shared state if it is referenced softly and the JVM needs to free some heap. An ORMObject is loaded, if the shared state exists.
Cache Reads: This is the number of read attempts for objects of the given type. This can be a lookup by primary key or a lookup by alternate key or a lookup from iterating a collection.
Cache Hits: The number of hits for the cache reads.
Cache Misses: The number of misses for the cache reads.
Cache Hit Ratio: The ratio hits/misses.
Database Reads: This is the number of statements sent to the database to get a single object instance, by primary or alternate key. When iterating a collection the object is directly created from the query result, no additional database read is made here and the number of database reads is not increased.
Database Hits: The number of hits for the database reads.
Database Misses: The number of misses for the database reads. If this number is very high and the number of cached instances is low, consider the configuration of a miss cache.
Database Hit Ratio: The ratio hits/misses.

Performance Monitoring

The SMC menu Monitoring - Performance is the starting point for working with performance sensors. Starting from this page you can ...

configure performance sensors.
watch performance sensors of a given type.
see the performance sensors triggered from a certain request.
create a performance report and compare one with another to discover differences.

What is a performance sensor?

The Intershop Commerce Management has a built in functionality to measure the runtime of a certain code snippet (in nanoseconds). For one single call this runtime is held by a so called runtime sensor. Multiple calls of the same name and type of a runtime sensor are finally consolidated in a so called performance sensor. Sensors exist for these types:

Request (switched on per default after installation)
Pipeline
Class
Managed Service
Object Path
Pagelet
Pipelet
Query
SQL
Template

What information provides a performance sensor?

A performance sensor provides the following information.

Name: The name of the sensor. Runtime sensors of the same name and type are consolidated to the the same performance sensor.

Hits: The number of calls of this sensor.

Total Time: The total time of execution. This includes the runtime of all sub-sensors.

Effective Time: The runtime spent only in this sensor and no sub-sensors. The following picture illustrates total time and effective time.

Average Time: The average total time (total/hits).

Minimum Time: The minimum total time.

Maximum Time: The maximum total time.

Performance sensors and multi-threading

A run-time sensor collects the run-time of a single thread. If a caller thread forks to multiple worker threads, then this must be considered when analyzing the performance numbers (e.g., for an import with multiple validators and bulkers). The next picture illustrates this.

Configuration

Performance sensors of different types can be switched on separately. The initial settings for the server startup are controlled by these application server properties:

appserver.properties

#
# monitoring section
#
intershop.monitoring.requests=true
intershop.monitoring.pipelines=false
intershop.monitoring.pipelets=false
intershop.monitoring.templates=false
intershop.monitoring.queries=false
intershop.monitoring.sql=false
intershop.monitoring.objectpath=false
intershop.monitoring.pagelet=false
intershop.monitoring.class=false
intershop.monitoring.log=false
intershop.monitoring.managedservice=false

It is strongly recommended for production systems to permanently switch on only sensors of type request. To do a detailed analysis on a running server the sensors can be switched on and off dynamically in the SMC under Monitoring - Performance - Configuration.

It is helpful to reset the sensors before each separate measurement.

Performance By Domain and Request

To analyze the performance of a particular request it is possible to select this specific request for a given domain and see the appropriate sensor data. Under Monitoring - Performance - Performance By Domain and Request select a domain, press Apply and select the link of the request of interest.

The resulting page shows all sub sensors:

Performance By Type

It is also possible to watch the sensors of one single type. For example, to see how many SQL statements are executed and analyze their runtime. Just select the appropriate sensor in Monitoring - Performance - Performance By Type.

Compare Performance Monitoring Results

Comparing performance sensors of different test runs is essential when comparing different builds or checking a different implementation for performance impact. Creating such a performance report possible in Monitoring - Performance - Compare Performance Monitoring Results. This page contains a button to create a new report and after selection of two reports a comparison is possible.

Per default the comparison result is ordered by effective time difference. All differences are absolute values (without sign). The red value is always the bigger one when comparing two values with each other.

Disclaimer

The information provided in the Knowledge Base may not be applicable to all systems and situations. Intershop Communications will not be liable to any party for any direct or indirect damages resulting from the use of the Customer Support section of the Intershop Corporate Web site, including, without limitation, any lost profits, business interruption, loss of programs or other data on your information handling system.

Table of Contents