The performance of an e-commerce system is a major criteria of its market acceptance. End users expect quick response times when visiting a web shop and the owner of the e-commerce application wants to optimize the costs of administrating and running the site. This concept covers the performance and scalability of an Intershop Commerce Management system. This includes coding guidelines as well as considerations how to analyze and tune a running application.
Performance tuning is an overall task of each (more complex) software development process, regardless of the type of development (standard product or customer project). Performance must be considered at every line of code. Consider, that the interface to your implementation, describing the requirements of your code, what is the task, the responsibility of the class and how it was implemented. This will reduce the costs of reworking significantly. Please consider:
Mostly performance optimizations resulting in code (java or sql-queries)
This also allows a detailed performance analysis by switching on multiple performance sensors (which add a processing overhead to the system).
Performance monitoring, tuning and bugfixing is an ongoing development task. The highest priority in the Intershop development is the storefront performance, followed by import performance and backoffice performance. This order of priorities results from the fact, that a low storefront performance has the biggest impact on the business of our clients.
The current development process includes automatic storefront performance tests of the builds of the main branch. All other performance tests (also import and backoffice) are executed on demand.
This simple top-level view shows three aspects that must be considered when tuning an Intershop Commerce Management system, the workload, the software and the hardware.
The workload is the sum of all requests and commands entering the system, for example storefront HTTP requests, back office requests, service calls or job executions. The software includes all Intershop Commerce Management and third-party software components and the operating systems of the used machines as well as all software-related settings. The hardware includes all machine and infrastructure hardware that is involved.
All three aspects might cause bottlenecks and all three can be manipulated to achieve better performance. The load, for example, can be reduced by decreasing the number of clicks that are necessary for each storefront use case.
Performance tuning involves a complex set of factors. There are critical issues to deal with at numerous points in a system. Guidelines for performance tuning must therefore address a variety of topics. Performance tuning of Intershop Commerce Management can be divided into the six tuning levels as shown in the picture above. These levels cover different phases of the project. Consequently, different people may handle the tuning tasks at each level. Hardware sizing and hardware extensions are not included in these levels, because these are considered as a separate subject. Extending available hardware cannot compensate for problems introduced at the other tuning levels. If a bad design leads to a substantial slowdown, it might be virtually impossible to achieve better results simply by adding more hardware.
Application architecture and application design have the highest impact. Performance tuning is not only about changing settings - it is mainly about architecture and design. Mistakes made in these "higher" levels cannot be corrected or resolved by changes in a lower level.
Tuning at levels A and B is important for software engineers and Web designers who develop Intershop Commerce Management-based Web sites or Intershop Commerce Management extensions such as cartridges. Tuning within levels C, E, and F should be carried out by a system administrator or other person experienced in configuration of distributed environments. Level D tuning deals with typical Oracle database tuning and must be handled by the database administrator.
The main sections of this document describe tuning at levels A, B, C, E, and F in detail. Database tuning is a broad topic that is not covered in this document.
The primary goal of optimizing the application architecture is to reduce the workload of different parts of the Intershop Commerce Management system. The following picture shows the most general parts of the system (web layer, application server layer and data storage layer).
The upper layers provide caching capabilities to reduce the load on the lower layers. I.e. the web layer provides a page cache to reduce the load on the application server layer. And the application server layer uses caches (e.g., the ORM cache) to reduce the load on the data storage layer (e.g., the database). It is essential for a good performance and scalability to use the caches appropriately and avoid cache misses.
The most important measure to achieve optimal performance is to cache as many pages as possible in the Intershop Commerce Management page cache by including the <iscache …> tag into the corresponding ISML templates. It is also advisable to choose the longest acceptable caching time for each template.
Using the page cache is the key to high performance. Typically, the page cache can deliver 10 to 100 times as many pages compared to pages generated dynamically by an Intershop Commerce Management application server. The number of simultaneous sessions that can be served is increased by the same factor.
It is often possible to cache pages even if they contain information that changes from time to time. For example, if product information is changed frequently and changes must be propagated quickly, the product pages can still be cached if the page lifetime is set to an interval that is short enough, for example, a few minutes. There is often an acceptable propagation delay and the cache hit rate is still high for the most frequently accessed products.
In production environments, one Web adapter is usually able to deliver about 100 to 300 pages per second from the page cache. The achieved page rate depends on factors like the page size, the number of different pages, the access patterns, and the used hardware.
The following conditions have to be met to write a response page into the page cache:
A page is delivered from the page cache if the following conditions are met:
Take into account that a different URL parameter order and different URL parameter combinations lead to different page access keys and therefore a multiple of cached pages, for example, if a certain URL parameter is sometimes omitted and sometimes not. Try to reduce the possible number of these URL variations.
Pages or parts of them can often be cached, even if there are some dynamic or session-specific data elements on the page.
There exist several ways to analyze the web layer and page cache:
Check the following items to achieve the best performance results in the web layer:
The application server caches several object instances to reduce the number of database queries and file accesses. Using a JMX tool (e.g., jconsole) these caches can be monitored.
All ORM objects which are referenced within the application server are collected in the cache of the ORM engine (except objects with the reference type NONE
). The ORM cache is not limited to a number of elements, it is just limited by the VM size. The cache usually holds soft references to the ORM objects, but the reference type (hard, soft, weak, none) can be specified when developing the ORM object. The clearing of soft references follows the implementation rules of a particular Java VM.
The ORM engine defines two types of caches. One cache references existing ORM objects by primary or alternate key. Optionally there can exist miss caches for each ORM object of a configurable size. The miss caches are LRU lists. When many DB misses occur for a specific ORM object a miss cache can significantly increase the system performance.
There exist several tools to analyze the application server caches:
Check the following items to verify application server caches are working fine:
getObjectsBySQLWhere()
.This section discusses Intershop Commerce Management design and coding guidelines to meet high performance and stability needs. The goal of performance tuning at the design level is to make pipelines and ISML templates as efficient as possible.
The basic principle is to keep it simple. This applies to both, design and coding. There is usually a way to simplify an algorithm, to shorten a code sequence, or to avoid a database access. If a required feature leads to a substantial slowdown, the feature should be reconsidered. Besides this, eliminating only a small part of the functionality can result in much better performance in many cases.
The important thing to remember is that there is always a way to build a feature for optimal performance. If you are not sure which solution is the best performing one, build a small prototype and load test it using multiple parallel requests.
In many cases, it is better to use several smaller machines for the Intershop Commerce Management application servers, instead of one large machine. The advantage is that the Intershop Commerce Management components can be distributed to increase the availability of the cluster in case of a machine failure. But there are also cases, where bigger JVMs increase the performance, e.g., when running imports with mass data.
It is recommended to install the Web adapters, the application servers, and the Oracle database server on separate, dedicated machines to simplify resource monitoring, tuning, troubleshooting, and hardware extensions. It is also strongly recommended to separate Web servers and application servers for performance and security reasons.
Here are some other things to consider:
See the Database cookbooks. Ask the DBA.
The application server settings split into two major areas: JVM settings and application server properties.
JVM settings are made in tomcat.sh|bat
. Here are the most important settings for a Linux system.
JAVA_OPTS=$JAVA_OPTS\ -Xms1024m JAVA_OPTS=$JAVA_OPTS\ -Xmx2048m JAVA_OPTS=$JAVA_OPTS\ -XX:MaxPermSize=400m JAVA_OPTS=$JAVA_OPTS\ -XX:NewRatio=8 # garbage collector log (choose one) #JAVA_OPTS=$JAVA_OPTS\ "-Xloggc:$IS_HOME/log/gc-$SERVER_NAME.log -XX:+PrintGCDetails" #JAVA_OPTS=$JAVA_OPTS\ "-verbose:gc -XX:+PrintGCDetails"
The initial heap size (-Xms
) per default is set to a value sufficient to keep all pre-loaded data for a standard installation. There are cases, where the garbage collector starts later with full GCs when setting the initial size equal to the maximum size. Analyze the GC logs of a test system with different settings to find the optimal setting for the customer's installation.
The maximum heap size (-Xmx
) per default is set to a value sufficient to handle the demo data. Dependent on the number of entities this value should be increased. The maximum size also depends on the type of application server. A server doing mass data imports can perform much better having a big heap. For a server handling storefront load this can be counterproductive, since the run-time of full GCs increases. Always check the settings on a test system before going live!
The MaxPermSize
defines the space used for loaded classes. If the custom installation contains many more code artifacts this value may need to be increased.
The NewRatio
specifies the ratio between new and old generation. Storefront applications require less GC when setting this value to 2
. For imports it is better to keep the value of 8
, because the old generation will be bigger and can keep more entities.
Use the mentioned GC settings to enable logging. See the JVM documentation how to interpret the log.
There are many files with properties controlling the application server behavior. The administration guide describes this in detail. This chapter just names some performance related properties from the files located in share/system/config/cluster and the cartridge property files. Note that there exists the ability to overwrite certain settings, e.g., in a deployment configuration or a server specific configuration.
appserver.properties
Consider customizing of the following settings in production systems:
intershop.job.enabled
: If using one or more dedicated job servers, set this setting to true
only for these servers, false
otherwise. Consider also to disable scheduled jobs when running load test to avoid jobs influencing the test results.*.CheckSource
: Set all check source configurations to false
for non-development systems.intershop.cpu.id
: Disable this property, i.e. enable all CPUs, if the server runs mass data operations (imports) which support multi-threading. Limiting to a number of CPUs (e.g., 4 out of 8 existing) must be administrated at OS level.intershop.pipelines|pipelets.PreloadFrom*
: Avoid pre-loading of unused cartridges on production systems. This wastes JVM heap and increases the server startup time.intershop.monitoring.*
: In production systems enable sensors for requests
only.Tip
In case your dedicated appserver's instances with WFS/BOS/JOB does not execute the job scheduler as expected, we propose to bind all regular running jobs to the server group "JOB" (see column SERVERGROUP of table JOBCONFIGURATION) and to enable job execution also for the appservers of WFS and BOS. Doing this allows to execute all regular jobs on the separate job server and allows the system to run import jobs triggered by the back office in the BOS server(s).
The configuration of each cartridge is located in the file share/system/config/cartridges/<cartridge name>.properties
. This file contains amongst others:
core
cartridge.Change the values only if a concrete performance problem was discovered and a different configuration fixes this problem. Otherwise side effects may occur.
There is a bunch of OS specific settings which can influence the system performance. This chapter names just a couple of them.
The TCP_TIME_WAIT
interval specifies how long a socket remains in TCP_TIME_WAIT
state after closing a TCP connection. To avoid running out of sockets this setting should not be too high. The maximum number of available sockets is also configurable in most OS within a specific range.
Synchronize the machine clocks! Otherwise situations may occur where pages form the page cache expire too early or sessions expire too early. This decreases the performance.
Network configuration and compatibility issues sometimes cause problems. If the throughput of an Intershop Commerce Management cluster is unexpectedly low and the hardware utilization (CPU, I/O) of all cluster components is also low, but the response times are high, network problems may be the cause. These problems are often hard to track down. There may occur cases, where the load balancer and the server machine are not compatible on networking level. Such a problem can be solved by adding a hardware switch in between. Other typical causes of problems are wrong half duplex/full duplex settings with twisted pair Ethernet connections, wrong routing, or slow firewalls.
Last but not least, any background processing of the OS should not slow down the Intershop Commerce Management system.
This chapter contains some information about tools for performance analysis. This includes tools for generating load on the system as well as tools to analyze the system. The Cookbook - Performance (valid to 7.10) contains detailed information on how to dealing with these tools.
A production system receives different types of load:
Simulating the load for analyzing long running processes in most cases corresponds to triggering the according action and doing some measurement, e.g., running an import and doing some analysis.
For short running requests the situation is a little bit different.
The simplest storefront load test is a single click. Especially after complex reworks a single click can run more than 10s (while the target value after performance optimization is just a fraction of a second). In that case it is useful to analyze and speed up this single click performance first.
Sometimes it is also useful to aggregate the performance numbers of multiple clicks, e.g., when profiling the application or checking the memory consumption.
Finally, there exist many 3rd party tools to create load on the Intershop Commerce Management system, such as Apache JMeter, Borland Silk Performer, Compuware dynaTrace or HP Loadrunner. The next sections contain some basic information of the tools used at Intershop R&D and QA.
Apache JMeter
Citation from the official web site: "The Apache JMeter desktop application is open source software, a 100% pure Java application designed to load test functional behavior and measure performance. It was originally designed for testing Web Applications but has since expanded to other test functions."
Since JMeter is open source it can be used by any developer to create load on an Intershop Commerce Management system. At the beginning it is sufficient to call a couple of static URLs by a number of parallel users. It is also possible to make the load more dynamic, either by reading input data from a file or parsing the response and building the subsequent request.
Borland Silk Performer
Citation from the official web site: "Silk Performer is an efficient, cost-effective way to ensure your mission-critical applications meet performance expectations and service-level requirements."
Silk Performer is used by QA to run several kinds of storefront load tests against Intershop Commerce Management, such as browsing users, order users, search users etc. Load tests in most cases run a user mix, e.g., 50% search users, 20% A2B users, 10% browsing users, 10% order users and 10% registered users. There exist different scripts to simulate the load. Silk Performer requires a license, that's why the access to this tool is limited.
There exist many tools for analyzing Java applications.
The JDK itself for example provides the JConsole and JVisualVM which allow analyzing applications in terms of heap usage, GC activity and other metrics. JConsole can also be used as JMX console to monitor and clear Intershop Commerce Management caches. JVisualVM includes profiler capabilities (it is a "NetBeans Light").
The load generator tools mentioned in the previous chapter also include analysis functionalities, most interesting here are request runtimes, page statistics and so on.
When doing single clicks browser tools or plugins can be used to analyze the response content and separate runtimes. Examples here are the Firebug plugin for Mozilla Firefox or Opera Dragonfly.
The following sections describe the built in analysis tools of Intershop Commerce Management's SMC.
Under Installation Maintenance - Dump Generation
is is possible to create thread dumps and heap dumps. Especially when the heap is completely used and the application runs in heavy garbage collections or even out-of-memory situations, a heap dump can help to analyze the instances in the heap, even offline with an according tool. Just select the application server, press Apply
and Create heapdump
afterwards.
Under Monitoring - OR Mapping - ORM Cache
a table with ORM cache statistics appears:
Here is the description of the columns and some performance related information:
ORMObjectKey
; the value is the instance of ORMObject
.ORMObject
instance is just a "shell", which references one shared state and multiple transactional states. Most of the instances use a soft reference for the shared state. The GC can drop a shared state if it is referenced softly and the JVM needs to free some heap. An ORMObject
is loaded, if the shared state exists.The SMC menu Monitoring - Performance
is the starting point for working with performance sensors. Starting from this page you can ...
What is a performance sensor?
The Intershop Commerce Management has a built in functionality to measure the runtime of a certain code snippet (in nanoseconds). For one single call this runtime is held by a so called runtime sensor. Multiple calls of the same name and type of a runtime sensor are finally consolidated in a so called performance sensor. Sensors exist for these types:
What information provides a performance sensor?
A performance sensor provides the following information.
Name: The name of the sensor. Runtime sensors of the same name and type are consolidated to the the same performance sensor.
Hits: The number of calls of this sensor.
Total Time: The total time of execution. This includes the runtime of all sub-sensors.
Effective Time: The runtime spent only in this sensor and no sub-sensors. The following picture illustrates total time and effective time.
Average Time: The average total time (total/hits).
Minimum Time: The minimum total time.
Maximum Time: The maximum total time.
Performance sensors and multi-threading
A run-time sensor collects the run-time of a single thread. If a caller thread forks to multiple worker threads, then this must be considered when analyzing the performance numbers (e.g., for an import with multiple validators and bulkers). The next picture illustrates this.
Configuration
Performance sensors of different types can be switched on separately. The initial settings for the server startup are controlled by these application server properties:
# # monitoring section # intershop.monitoring.requests=true intershop.monitoring.pipelines=false intershop.monitoring.pipelets=false intershop.monitoring.templates=false intershop.monitoring.queries=false intershop.monitoring.sql=false intershop.monitoring.objectpath=false intershop.monitoring.pagelet=false intershop.monitoring.class=false intershop.monitoring.log=false intershop.monitoring.managedservice=false
It is strongly recommended for production systems to permanently switch on only sensors of type request
. To do a detailed analysis on a running server the sensors can be switched on and off dynamically in the SMC under Monitoring - Performance - Configuration
.
It is helpful to reset the sensors before each separate measurement.
Performance By Domain and Request
To analyze the performance of a particular request it is possible to select this specific request for a given domain and see the appropriate sensor data. Under Monitoring - Performance - Performance By Domain and Request
select a domain, press Apply
and select the link of the request of interest.
The resulting page shows all sub sensors:
Performance By Type
It is also possible to watch the sensors of one single type. For example, to see how many SQL statements are executed and analyze their runtime. Just select the appropriate sensor in Monitoring - Performance - Performance By Type
.
Compare Performance Monitoring Results
Comparing performance sensors of different test runs is essential when comparing different builds or checking a different implementation for performance impact. Creating such a performance report possible in Monitoring - Performance - Compare Performance Monitoring Results
. This page contains a button to create a new report and after selection of two reports a comparison is possible.
Per default the comparison result is ordered by effective time difference. All differences are absolute values (without sign). The red value is always the bigger one when comparing two values with each other.