I have contributed a bunch of patches to the Apache Project that together increase the performance of the Apache HTTP server up to 900%. This document summarizes how to maximize the performance of Apache using my patches. The measures of performance in this document are the SPECweb96 and SPECweb99 benchmarks. SPECweb96 measures only the number of static HTTP/1.0 GET requests a web server can service per second. SPECweb99 adds HTTP/1.1, a dynamic component, and CGI and measures the number of simultaneous connections a web server can handle. This paper assumes a working knowledge of Apache internals.
The advice that follows may or may not apply to your system and may increase or decrease your system's performance depending on factors beyond the scope of this document. Use this guide only in conjunction with a clear understanding of your performance goals and overriding needs.
The patches I contributed to the Apache Project modify Apache's source code and standard configuration files to speed up the processing of HTTP requests in various ways. Still, Apache must be configured, compiled, and tuned to achieve the best performance on a given system. The following sections describe the configurable options relevant for high performance.
The options fall into four categories by what they control: the QSC, Apache's child processes, direct I/O, the stat cache, and other.
The Quick Shortcut Cache (QSC) cuts out
unnecessary processing for requests to cached static content. It sports
a number of configurable options.
Two of these, QSC_HASH_SIZE
and QSC_MAX_SIZE
,
limit the maximum size of the cache and almost certainly need to be
increased from their default values to provide the maximum benefit for
a given system. A third, QSC_HEADER_GRAIN
(which normally
uses the same value as CACHE_ALIGNMENT
), regulates
alignment and padding and can improve performance more than you might
think when set correctly. See the QSC
documentation for details on these
options and for instructions on using
the QSC.
Also, the QSC can manage the very large amounts of data enabled by a 64-bit address space although with a slight performance penalty.
As patched, Apache's standard run-time configuration file httpd.conf
enables the QSC when the mmap_static module is installed. Leave the QSC
on
directive as is.
Summary of options in this section:
configure
options:
-DQSC_HASH_SIZE=
something
-DQSC_MAX_SIZE=
something
-DCACHE_ALIGNMENT=
something or -DQSC_HEADER_GRAIN=
something
mmapfile
filename
- one for each file
QSC on
Under heavy load, a system can spend most of its computing power creating, destroying, and scheduling Apache child processes. Eliminating all this overhead allows the system to spend more of its time processing HTTP requests but reduces Apache's ability to adapt to a changing load.
Apache's standard run-time configuration file httpd.conf
sets no limit on the number of requests each child process handles
before terminating itself and forcing the parent Apache process to fork
a replacement. Leave the MaxRequestsPerChild
0
directive as is.
Choose a number of processes and force Apache always to use that
number, never more or less. Set the MinSpareServers
, MaxSpareServers
, StartServers
,
and MaxClients
options all to the same value and ignore the resulting warning:
server reached MaxClients setting, consider raising the
MaxClients setting
The number you use is up to you but remember, sometimes having fewer processes runs faster than having more. Start with, say, two per network interface and increase from there as necessary if your content is predominantly static. You will need more if your content is significantly dynamic (e.g., lots of CGI).
Bind interrupts from network interface devices to specific processors
(distributing them as evenly as possible across all the processors) and
bind the Apache child processes listening to those interfaces to the
same processors using the Listen
directive. Force each child process to listen only to its designated
interface -- and eliminate performance-sapping accept serialization
-- by turning SingleListen
on
.
Summary of options in this section:
configure
options:
Direct I/O is the name for reading or writing data directly into and out of user space without using the system's buffer cache.
Normally Apache reads or memory-maps disk files containing content it
serves. The system DMAs data from disk into a system buffer. If Apache
uses read()
the system copies the data into Apache's
memory; if Apache uses mmap()
it and the system share the
system buffer's memory. Either way, the system maintains the cache of
buffered data which it manages without guidance from Apache. In
particular, the system can choose to invalidate any data in its buffer
cache when it needs to insert new data into a full cache. If the system
chooses poorly, it quickly becomes disk-bound re-reading data it just
had in its cache that it evicted to make room for other data. This
problem does not occur when the system has enough memory to store all
the files Apache serves (at once), a configuration I recommend. But
when memory is tight, direct I/O can help.
Direct I/O bypasses the system's buffer cache. When Apache uses direct
I/O read()
the system DMAs data from disk directly into
Apache's memory. Judicious use can reduce thrashing of the buffer
cache, but overuse can lead to similarly degraded performance because
every read()
causes disk I/O. (Memory mapping cannot be
used with direct I/O.)
At this time Apache supports only SGI's Irix implementation of direct I/O.
When USE_DIRECT_IO
is defined, Apache's default request
handler will attempt to use direct I/O to serve files that are DIRECT_THRESHOLD
bytes in size or larger (default 64 MB). I say "attempt"
because direct I/O requires stringent data alignment which certain
types of requests (in particular, data ranges) violate. If Apache
cannot use direct I/O it falls back to using either mmap()
or read()
. Note that only the default handler tries
direct I/O and that, in particular, the mmap_static module's handler
does not. To make Apache try direct I/O you must not include an mmapfile
directive for that file.
Apache limits its use of direct I/O to large files on the assumption that it serves small files more often than large ones, and because under low memory conditions a single large file blows away many small files in the system's buffer cache. This scheme increases the chance that small files will remain buffered, helping performance.
Two other parameters help control Apache's use of direct I/O. MAX_DIRECT_ALIGN
sets the maximum alignment boundary permissible when using direct I/O,
in bytes (default 4 KB). MAX_DIRECT_BUFSIZE
sets the size
of the buffer Apache uses when performing direct I/O, in bytes (default IOBUFSIZE
).
If the file's alignment restrictions exceed these parameters Apache
will log a warning and not use direct I/O for that file. Both
parameters must be powers of two greater than zero and can be changed
at will.
Summary of options in this section:
configure
options:
-DUSE_DIRECT_IO
-DDIRECT_THRESHOLD=
something
-DMAX_DIRECT_ALIGN=
something
-DMAX_DIRECT_BUFSIZE=
something
mmapfile
directives for files on which you
want Apache to try direct I/O
The Stat Cache caches the results of recent stat()
system
calls. Most requests not handled by the QSC cause
Apache to stat()
at least one file and possibly many
depending on other configuration settings. This saps performance and is
often unnecessary, as the same files or directories are queried
repeatedly and usually return the same status. The stat cache amortizes
the cost of these expensive file system operations over many requests
and improves performance, at the cost of using more memory and adding
delay between the time a file changes and the time Apache notices.
Briefly, the stat cache, when enabled (USE_STAT_CACHE
is
defined) caches the results of a certain number (STAT_CACHE_SIZE
)
of stat()
results for a period of time (STAT_CACHE_TIME
seconds) as long as the full path name of the file is short enough (STAT_CACHE_PATHLEN
bytes including the null).
Each Apache child process has its own stat cache so I recommend keeping it small. The total amount of memory the stat cache consumes is proportional to:
STAT_CACHE_SIZE * STAT_CACHE_PATHLEN *
#-of-Apache-processes
Each Apache process's stat cache maintains informational counters to
assist experienced users with tuning. The stat_cache_stats
structure, currently visible only via a debugger, contains:
nhits
- number of cache lookups that matched an
existing entry
nmisses
- number of cache lookups that found no
existing entry
nrestats
- number of cache hits with too-old results
nrejects
- number of cache misses that could not be
entered into the cache because the path names were too long
Only certain parts of Apache take advantage of the stat cache. If you
know of other places where it is recommended and safe to use the stat
cache (ap_stat_cache()
) instead of regular stat()
-- or if you have a better way to export the counters -- please contribute your changes.
Stat cache options:
USE_STAT_CACHE |
Enables the stat cache. Default: not defined, so the stat cache is disabled. |
STAT_CACHE_SIZE |
Sets the number of cache entries. Default: 16. |
STAT_CACHE_TIME |
The maximum age in seconds of cached |
STAT_CACHE_PATHLEN |
The maximum length in bytes of a cached path name, including the trailing null. Default: 128. |
Summary of options in this section:
configure
options:
-DUSE_STAT_CACHE
or -DSPEED_DAEMON
-DSTAT_CACHE_SIZE=
something
-DSTAT_CACHE_TIME=
something
-DSTAT_CACHE_PATHLEN=
something
Always compile optimized.
Define SPEED_DAEMON
which defines all of these other
tokens:
USE_QSC
- enables the QSC
FAST_TIME
- speeds up time-keeping
BUFFERED_LOGS
- buffers server log entries
NO_GRACEFUL
- eliminates signal overhead but disables graceful restarts
USE_QUICK_LOG
- accelerates logging in common-log format
USE_STAT_CACHE
- enables the stat
cache
Define the value of LOG_BUFSIZE
to be some multiple of the system's page size, less a few dozen bytes
or so for overhead.
Use only the Common Log Format.
Raise the limit on the number of response bytes written at a time by
defining the value of MMAP_SEGMENT_SIZE
to be larger than
the largest cached file -- but not on Linux, where this actually slows
things down!
Make the listen queue size as large as possible using the ListenBacklog
directive.
Summary of options in this section:
configure
options:
-O
-DSPEED_DAEMON
-DLOG_BUFSIZE=
something
-DMMAP_SEGMENT_SIZE=
something
ListenBacklog
something
Here is the configuration I use to achieve maximum SPECweb96 performance on an SGI Origin200 server running Irix 6.5 with two processors, two gigabytes of memory, and four 100BaseT network interfaces. Without any patching or tuning Apache on this system handles 240 operations per second; with my patches and the following tuning it handles 2400*.
* These are actual SPECweb96 results and while I believe they are accurate and meaningful, they have not been submitted to SPEC for review or publication. The following is for illustrative purposes only and is not a SPECweb96 disclosure.
configure
options:
--enable-module=mmap_static --enable-module=info
-O2
-DSPEED_DAEMON
-DLOG_BUFSIZE=65500
-DMMAP_SEGMENT_SIZE=1048576
-DQSC_HASH_SIZE=32768
-DQSC_MAX_SIZE=5000000
ServerType standalone ServerName something ServerAdmin root ServerRoot /a/apache ServerSignature Off PidFile /a/apache/logs/httpd.pid ScoreBoardFile /a/apache/logs/httpd.scoreboard Timeout 300 KeepAlive On MaxKeepAliveRequests 0 KeepAliveTimeout 15 MinSpareServers 12 MaxSpareServers 12 StartServers 12 MaxClients 12 MaxRequestsPerChild 0 User apache Group apache Port 80 Listen 100.100.100.101:80 0 Listen 100.100.100.102:80 0 Listen 100.100.100.103:80 1 Listen 100.100.100.104:80 1 ListenBacklog 1000 SingleListen on QSC on ClearModuleList AddModule mod_mmap_static.c AddModule mod_log_config.c DocumentRoot "/a/htdocs" <Directory /> Options FollowSymLinks AllowOverride None </Directory> UseCanonicalName Off DefaultType text/plain HostnameLookups Off ErrorLog /b/logs/errors-apache LogLevel warn LogFormat "%a %l %u %t \"%r\" %>s %b" quick CustomLog /b/logs/access-apache quick AddModule mod_mime.c AddModule mod_status.c AddModule mod_info.c <Location /server-status> SetHandler server-status </Location> <Location /server-info> SetHandler server-info </Location> mmapfile /a/htdocs/spec/file_set/dir0/class0_0 mmapfile /a/htdocs/spec/file_set/dir0/class0_1 mmapfile /a/htdocs/spec/file_set/dir0/class0_2 mmapfile /a/htdocs/spec/file_set/dir0/class0_3 ... mmapfile /a/htdocs/spec/file_set/dir223/class3_5 mmapfile /a/htdocs/spec/file_set/dir223/class3_6 mmapfile /a/htdocs/spec/file_set/dir223/class3_7 mmapfile /a/htdocs/spec/file_set/dir223/class3_8 AddModuleInfo mod_mmap_static.c "Configured for 2500 SPECweb96 ops/sec"
Direct I/O is neither needed nor used because the entire SPECweb96 file set fits in memory in this example. If the system had less memory I would add:
-DUSE_DIRECT_IO
-DDIRECT_THRESHOLD=800000
-DMAX_DIRECT_BUFSIZE=65536
and delete the mmapfile directives for all class3_8
files:
# mmapfile /a/htdocs/spec/file_set/dir0/class3_8 ... # mmapfile /a/htdocs/spec/file_set/dir223/class3_8
and maybe class3_7
, class3_6
, and others too
until the system performs as few disk reads as possible during the
benchmark run.
The stat cache also is neither needed nor used because the QSC handles
every request the benchmark measures and there are no stat()
s
to cache. But when using direct I/O or CGI the stat cache can help. I
would add:
USE_STAT_CACHE
is already
defined, by SPEED_DAEMON
)
-DSTAT_CACHE_SIZE=250
-DSTAT_CACHE_TIME=30000
-DSTAT_CACHE_PATHLEN=48
O_DIRECT
in
SGI's open(2)
man page.