[APACHE DOCUMENTATION]

Apache HTTP Server Version 1.3

Making Apache Ten Times Faster

Mike Abbott - mja@sgi.com
Accelerating Apache Project


I have contributed a bunch of patches to the Apache Project that together increase the performance of the Apache HTTP server up to 900%. This document summarizes how to maximize the performance of Apache using my patches. The measures of performance in this document are the SPECweb96 and SPECweb99 benchmarks. SPECweb96 measures only the number of static HTTP/1.0 GET requests a web server can service per second. SPECweb99 adds HTTP/1.1, a dynamic component, and CGI and measures the number of simultaneous connections a web server can handle. This paper assumes a working knowledge of Apache internals.

The advice that follows may or may not apply to your system and may increase or decrease your system's performance depending on factors beyond the scope of this document. Use this guide only in conjunction with a clear understanding of your performance goals and overriding needs.

Contents

Introduction

The patches I contributed to the Apache Project modify Apache's source code and standard configuration files to speed up the processing of HTTP requests in various ways. Still, Apache must be configured, compiled, and tuned to achieve the best performance on a given system. The following sections describe the configurable options relevant for high performance.

The options fall into four categories by what they control: the QSC, Apache's child processes, direct I/O, the stat cache, and other.

Tuning the QSC

The Quick Shortcut Cache (QSC) cuts out unnecessary processing for requests to cached static content. It sports a number of configurable options. Two of these, QSC_HASH_SIZE and QSC_MAX_SIZE, limit the maximum size of the cache and almost certainly need to be increased from their default values to provide the maximum benefit for a given system. A third, QSC_HEADER_GRAIN (which normally uses the same value as CACHE_ALIGNMENT), regulates alignment and padding and can improve performance more than you might think when set correctly. See the QSC documentation for details on these options and for instructions on using the QSC.

Also, the QSC can manage the very large amounts of data enabled by a 64-bit address space although with a slight performance penalty.

As patched, Apache's standard run-time configuration file httpd.conf enables the QSC when the mmap_static module is installed. Leave the QSC on directive as is.

Summary of options in this section:

Tuning Child Processes

Under heavy load, a system can spend most of its computing power creating, destroying, and scheduling Apache child processes. Eliminating all this overhead allows the system to spend more of its time processing HTTP requests but reduces Apache's ability to adapt to a changing load.

Apache's standard run-time configuration file httpd.conf sets no limit on the number of requests each child process handles before terminating itself and forcing the parent Apache process to fork a replacement. Leave the MaxRequestsPerChild 0 directive as is.

Choose a number of processes and force Apache always to use that number, never more or less. Set the MinSpareServers, MaxSpareServers, StartServers, and MaxClients options all to the same value and ignore the resulting warning:

server reached MaxClients setting, consider raising the MaxClients setting

The number you use is up to you but remember, sometimes having fewer processes runs faster than having more. Start with, say, two per network interface and increase from there as necessary if your content is predominantly static. You will need more if your content is significantly dynamic (e.g., lots of CGI).

Bind interrupts from network interface devices to specific processors (distributing them as evenly as possible across all the processors) and bind the Apache child processes listening to those interfaces to the same processors using the Listen directive. Force each child process to listen only to its designated interface -- and eliminate performance-sapping accept serialization -- by turning SingleListen on.

Summary of options in this section:

Direct I/O

Direct I/O is the name for reading or writing data directly into and out of user space without using the system's buffer cache.

Normally Apache reads or memory-maps disk files containing content it serves. The system DMAs data from disk into a system buffer. If Apache uses read() the system copies the data into Apache's memory; if Apache uses mmap() it and the system share the system buffer's memory. Either way, the system maintains the cache of buffered data which it manages without guidance from Apache. In particular, the system can choose to invalidate any data in its buffer cache when it needs to insert new data into a full cache. If the system chooses poorly, it quickly becomes disk-bound re-reading data it just had in its cache that it evicted to make room for other data. This problem does not occur when the system has enough memory to store all the files Apache serves (at once), a configuration I recommend. But when memory is tight, direct I/O can help.

Direct I/O bypasses the system's buffer cache. When Apache uses direct I/O read() the system DMAs data from disk directly into Apache's memory. Judicious use can reduce thrashing of the buffer cache, but overuse can lead to similarly degraded performance because every read() causes disk I/O. (Memory mapping cannot be used with direct I/O.)

At this time Apache supports only SGI's Irix implementation of direct I/O.

When USE_DIRECT_IO is defined, Apache's default request handler will attempt to use direct I/O to serve files that are DIRECT_THRESHOLD bytes in size or larger (default 64 MB). I say "attempt" because direct I/O requires stringent data alignment which certain types of requests (in particular, data ranges) violate. If Apache cannot use direct I/O it falls back to using either mmap() or read(). Note that only the default handler tries direct I/O and that, in particular, the mmap_static module's handler does not. To make Apache try direct I/O you must not include an mmapfile directive for that file.

Apache limits its use of direct I/O to large files on the assumption that it serves small files more often than large ones, and because under low memory conditions a single large file blows away many small files in the system's buffer cache. This scheme increases the chance that small files will remain buffered, helping performance.

Two other parameters help control Apache's use of direct I/O. MAX_DIRECT_ALIGN sets the maximum alignment boundary permissible when using direct I/O, in bytes (default 4 KB). MAX_DIRECT_BUFSIZE sets the size of the buffer Apache uses when performing direct I/O, in bytes (default IOBUFSIZE). If the file's alignment restrictions exceed these parameters Apache will log a warning and not use direct I/O for that file. Both parameters must be powers of two greater than zero and can be changed at will.

Summary of options in this section:

The Stat Cache

The Stat Cache caches the results of recent stat() system calls. Most requests not handled by the QSC cause Apache to stat() at least one file and possibly many depending on other configuration settings. This saps performance and is often unnecessary, as the same files or directories are queried repeatedly and usually return the same status. The stat cache amortizes the cost of these expensive file system operations over many requests and improves performance, at the cost of using more memory and adding delay between the time a file changes and the time Apache notices.

Briefly, the stat cache, when enabled (USE_STAT_CACHE is defined) caches the results of a certain number (STAT_CACHE_SIZE) of stat() results for a period of time (STAT_CACHE_TIME seconds) as long as the full path name of the file is short enough (STAT_CACHE_PATHLEN bytes including the null).

Each Apache child process has its own stat cache so I recommend keeping it small. The total amount of memory the stat cache consumes is proportional to:

STAT_CACHE_SIZE * STAT_CACHE_PATHLEN * #-of-Apache-processes

Each Apache process's stat cache maintains informational counters to assist experienced users with tuning. The stat_cache_stats structure, currently visible only via a debugger, contains:

Only certain parts of Apache take advantage of the stat cache. If you know of other places where it is recommended and safe to use the stat cache (ap_stat_cache()) instead of regular stat() -- or if you have a better way to export the counters -- please contribute your changes.

Stat cache options:

USE_STAT_CACHE

Enables the stat cache.

Default: not defined, so the stat cache is disabled.
STAT_CACHE_SIZE

Sets the number of cache entries.

Default: 16.
STAT_CACHE_TIME

The maximum age in seconds of cached stat() results. When a cached result is too old the cache (lazily) restats the file.

Default: 30.
STAT_CACHE_PATHLEN

The maximum length in bytes of a cached path name, including the trailing null.

Default: 128.

Summary of options in this section:

General Tuning

Always compile optimized.

Define SPEED_DAEMON which defines all of these other tokens:

Define the value of LOG_BUFSIZE to be some multiple of the system's page size, less a few dozen bytes or so for overhead.

Use only the Common Log Format.

Raise the limit on the number of response bytes written at a time by defining the value of MMAP_SEGMENT_SIZE to be larger than the largest cached file -- but not on Linux, where this actually slows things down!

Make the listen queue size as large as possible using the ListenBacklog directive.

Summary of options in this section:

Example

Here is the configuration I use to achieve maximum SPECweb96 performance on an SGI Origin200 server running Irix 6.5 with two processors, two gigabytes of memory, and four 100BaseT network interfaces. Without any patching or tuning Apache on this system handles 240 operations per second; with my patches and the following tuning it handles 2400*.

* These are actual SPECweb96 results and while I believe they are accurate and meaningful, they have not been submitted to SPEC for review or publication. The following is for illustrative purposes only and is not a SPECweb96 disclosure.

Direct I/O is neither needed nor used because the entire SPECweb96 file set fits in memory in this example. If the system had less memory I would add:

and delete the mmapfile directives for all class3_8 files:

and maybe class3_7, class3_6, and others too until the system performs as few disk reads as possible during the benchmark run.

The stat cache also is neither needed nor used because the QSC handles every request the benchmark measures and there are no stat()s to cache. But when using direct I/O or CGI the stat cache can help. I would add:

See also


Apache HTTP Server Version 1.3

Index