Table of Contents
In Lire, a DLF Analyser is a plugin that can extract or derived data from other DLF data. The idea is that these analysis do not depends on the underlying log format but that it can be found simply by using the data normalised in the DLF schema.
For example, an analyser could assign category based on the url that was visited (like assigning the 'Public' or 'Private' category). This categorising operation doesn't depends on the log format but only on the presence of the requested_page field in the schema. This would be an example of a special kind of analyser, a Lire DLF Categoriser. This is a simpler analyser that can create new fields based on one DLF record.
The doc/examples in the source distribution contains the complete code for this categoriser.
There is a more generic kind of analysers that create data in another dlf streams based on arbitrary queries on the source DLF schema. An example of this kind is an analyser that construct session summary from the www requests. It reads the DLF records of the www DLF schema and creates www-user_session DLF records from that.
Writing an analyser is similar to writing a DLF converter, so consult Chapter 2, Writing a New DLF Converter for the details converning registration and using configuration.
The simplest form of analyser are categorisers. In this section, we will show an example of how to write a categoriser that can assign categories using regular expressions to each www requested page.
A categoriser writes DLF in an extended schema. An extended schemas is an extension of a base schema. If you are familiar with SQL you can see it as an inner join with the main schema. That is each fields in the main schema will have the extension fields of the extended schema.
In our case our extended schema is very simple, it only adds one category field to the www schema.
Defining an extended schema is identical to writing a DLF Schema with exception that we use a different top-level element. You should consult Chapter 3, Writing a DLF Schema for all the details. Here is the extended schema that our categoriser will use:
<?xml version="1.0"?> <!DOCTYPE lire:extended-schema PUBLIC "-//LogReport.ORG//DTD Lire DLF Schema Markup Language V1.1//EN" "http://www.logreport.org/LDSML/1.1/ldsml.dtd"> <lire:extended-schema id="www-category" base-schema="www" xmlns:lire="http://www.logreport.org/LDSML/"> <lire:title>Category Extended Schema for WWW service</lire:title> <lire:description> <para>This is an extended schema for the WWW service which adds a category field based on the regexp matched by the requested_page. </para> </lire:description> <lire:field name="category" type="string" label="Category"> <lire:description> <para>This fields contain the page category.</para> </lire:description> </lire:field> </lire:extended-schema>
The difference with a regular DLF schema is that it starts with the extended-schema tag which has a base-schema attribute which should contain the DLF schema or derived DLF schema that is extended.
Like a DLF Converter, the categoriser s an object deriving from a base class which defines the categoriser interface. In the categoriser case, that interface is Lire::DlfCategoriser. The categoriser also has to provide some meta-information to the framework. Here is the code for all of this:
package MyAnalysers::PageCategoriser; use base qw/Lire::DlfCategoriser/; sub new { return bless {}, shift; } sub name { return 'page-categoriser'; } sub title { return "A page categoriser"; } sub description { return "<para>A categoriser that assigns categories based on a map of regular expressions to categories.</para>"; } sub src_schema { return "www"; } sub dst_schema { return "www-category"; }
The methods different from the DLf converter case are the src_schema which specifies the schema which to which fields are added and the dst_schema which gives the schema specifying the fields that will be added.
Our categoriser will assign categories based on a mapping from regular expression to category names. To be useful, this mapping should be configurable. Like all plugins in Lire, DLF categorisers can use the Lire Configuration Specification Markup Language to defines the configuration data they use (see Chapter 8, The Lire Report Configuration Specification Markup Language for the full details). The convention is that if there is a parameter named yourname_propeties, this is considered the configuration specification for the plugin yourname. This will mean that a little button will appear in the lire user interface so that the user can configure your plugin data.
In our categoriser case, we will define a list of records which will enable the user to define many pairs of regular expression and category name:
<?xml version="1.0"?> <!DOCTYPE lrcsml:config-spec PUBLIC "-//LogReport.ORG//DTD Lire Report Configuration Specification Markup Language V1.0//EN" "http://www.logreport.org/LRCSML/1.1/lrcsml.dtd"> <lrcsml:config-spec xmlns:lrcsml="http://www.logreport.org/LRCSML/" xmlns:lrcml="http://www.logreport.org/LRCML/"> <lrcsml:list name="page-categoriser_properties"> <lrcsml:summary>Page Categoriser Configuration</lrcsml:summary> <lrcsml:description> <para>This is a list of regexp that will be apply in this order along the category that should be applied when the regexp match. </para> </lrcsml:description> <lrcsml:record name="regex2category"> <lrcsml:summary>The Regexp-Category Association</lrcsml:summary> <lrcsml:string name="regex"> <lrcsml:summary>Regex</lrcsml:summary> <lrcsml:description> <para>The regular expression to test.</para> </lrcsml:description> </lrcsml:string> <lrcsml:string name="category"> <lrcsml:summary>Category</lrcsml:summary> <lrcsml:description> <para>This field contains the category that should be assigned.</para> </lrcsml:description> </lrcsml:string> </lrcsml:record> </lrcsml:list> p <lrcml:param name="page-categoriser_properties"> <lrcml:param name="regex2category"> <lrcml:param name="regex">.*</lrcml:param> <lrcml:param name="category">Unknown</lrcml:param> </lrcml:param> </lrcml:param> </lrcsml:list> </lrcsml:config-spec>
This specification also sets a list containing one catchall regex with the category 'Uknown'. The user could add other values before that. An alternative implementation could define a field specifying the default category to assign when no regular expression matches.
Two methods are needed to implement the categoriser. The first is an initialisation method called initialise. This method receives as parameter the configuration data entered by the user.
In our case, we will compile the regular expressions for faster processing later on :
sub initialise { my ( $self, $config ) = @_; foreach my $map ( @$config ) { $map->[0] = qr/$map->[0]/; } $self->{'categories'} = $config; return; }
The categorising is made in the categorise method. This method receives as parameter the DLF record to which the extended fields should be added. This DLF record is an hash reference containing one key for each of the fields defined in the source DLF schema. We simply assign the extended fields by adding new keys to the hash reference :
sub categorise { my ( $self, $dlf ) = @_; foreach my $map ( @{$self->{'categories'}} ) { if ( $dlf->{'requested_page'} =~ /$map->[0]/ ) { $dlf->{'category'} = $map->[1]; return; } } return; }
That's all. Like for the DLF converter you'll need to register this analyser with the Lire::PluginManager (see the section called “Registering Your DLF Converter with the Lire Framework” for more information.