The primary bottleneck in most dynamic web-based applications is the retrieval of data from the database. While it is relatively inexpensive to add more front-end servers to scale the serving of pages and images and the processing of content, it is an expensive and complex ordeal to scale the database. By taking advantage of data caching, most web applications can reduce latency times and scale farther with fewer machines.
JCS is a front-tier cache that can be configured to maintain consistency across multiple servers by using a centralized remote server or by lateral distribution of cache updates. Other caches, like the Javlin EJB data cache, are basically in-memory databases that sit between your EJB's and your database. Rather than trying to speed up your slow EJB's, you can avoid most of the network traffic and the complexity by implementing JCS front-tier caching. Centralize your EJB access or your JDBC data access into local managers and perform the caching there.
The data used by most web applications varies in its dynamicity, from completely static to always changing at every request. Everything that has some degree of stability can be cached. Prime candidates for caching range from the list data for stable dropdowns, user information, discrete and infrequently changing information, to stable search results that could be sorted in memory.
Since JCS is distributed and allows updates and invalidations to be broadcast to multiple listeners, frequently changing items can be easily cached and kept in sync through your data access layer. For data that must be 100% up to date, say an account balance prior to a transfer, the data should directly be retrieved from the database. If your application allows for the viewing and editing of data, the data for the view pages could be cached, but the edit pages should, in most cases, pull the data directly from the database.
Let's say that you have an e-commerce book store. Each book has a related set of information that you must present to the user. Let's say that 70% of your hits during a particular day are for the same 1,000 popular items that you advertise on key pages of your site, but users are still actively browsing your catalog of over a million books. You cannot possibly cache your entire database, but you could dramatically decrease the load on your database by caching the 1,000 or so most popular items.
For the sake of simplicity let's ignore tie-ins and user-profile based suggestions (also good candidates for caching) and focus on the core of the book detail page.
A simple way to cache the core book information would be to
create a value object for book data that contains the
necessary information to build the display page. This value
object could hold data from multiple related tables or book
subtype table, but lets say that you have a simple table
called BOOK
that looks something like this:
We could create a value object for this table called
BookVObj
that has variables with the same
names as the table columns that might look like this:
Then we can create a manager called
BookVObjManager
to store and retrieve
BookVObj
's. All access to core book data
should go through this class, including inserts and
updates, to keep the caching simple. Let's make
BookVObjManager
a singleton that gets a
JCS access object in initialization. The start of the
class might look like:
To get a BookVObj
we will need some access
methods in the manager. We should be able to get a
non-cached version if necessary, say before allowing an
administrator to edit the book data. The methods might
look like:
We will also need a method to insert and update book data. To keep the caching in one place, this should be the primary way core book data is created. The method might look like:
As elements are placed in the cache via put
, it
is possible to specify custom attributes for those elements
such as its maximum lifetime in the cache, whether or not it
can be spooled to disk, etc. It is also possible (and easier)
to define these attributes in the configuration file as
demonstrated later. We now have the basic infrastructure for
caching the book data.
The first step in creating a cache region is to determine the
makeup of the memory cache. For the book store example, I
would create a region that could store a bit over the minimum
number I want to have in memory, so the core items always
readily available. I would set the maximum memory size to
1200
. In addition, I might want to have all
objects in this cache region expire after 7200
seconds. This can be configured in the element attributes on
a default or per-region basis as illustrated in the
configuration file below.
For most cache regions you will want to use a disk cache if the data takes over about .5 milliseconds to create. The indexed disk cache is the most efficient disk caching auxiliary, and for normal usage it is recommended.
The next step will be to select an appropriate distribution layer. If you have a back-end server running an apserver or scripts or are running multiple webserver VMs on one machine, you might want to use the centralized remote cache. The lateral cache would be fine, but since the lateral cache binds to a port, you'd have to configure each VM's lateral cache to listen to a different port on that machine.
If your environment is very flat, say a few load-balanced webservers and a database machine or one webserver with multiple VMs and a database machine, then the lateral cache will probably make more sense. The TCP lateral cache is recommended.
For the book store configuration I will set up a region
for the bookCache
that uses the LRU memory
cache, the indexed disk auxiliary cache, and the remote
cache. The configuration file might look like this:
I've set up the default cache settings in the above
file to approximate the bookCache
settings. Other non-preconfigured cache regions will
use the default settings. You only have to configure
the auxiliary caches once. For most caches you will
not need to pre-configure your regions unless the size
of the elements varies radically. We could easily put
several hundred thousand BookVObj
's in
memory. The 1200
limit was very
conservative and would be more appropriate for a large
data structure.
To get running with the book store example, I will also need to start up the remote cache server on the scriptserver machine. The remote cache documentation describes the configuration.
I now have a basic caching system implemented for my book data. Performance should improve immediately.