Class ContentTemplate

  • All Implemented Interfaces:
    TemplateInterface
    Direct Known Subclasses:
    ExpContentTemplate

    public class ContentTemplate
    extends Template
    Template class for extracting content out of remote html pages. This class is used by the TemplateHandler, for extracting the "content" out of html documents for later integration with a look-and-feel template using one or more of: SetTemplate, BSLTemplate, or ReplaceFilter, The plan is to snag the title and the content, and put them into request properties. The resultant processed output will be discarded. The following properties are gathered:
    title
    The document title
    all
    The entire content
    bodyArgs
    The attributes to the body tag, if any
    content
    The body, delimited by content.../content>. The text inside multiple <content> ... </content> pairs are concatenated together.
    script
    All "<script>"..."</script>" tags found in the document head
    scriptSrcs
    A white-space delimited list of all "src" attributes found in "script" tags.
    style
    All "<style">..."</style"> tags found in the document head
    meta-[name]
    Every meta tag "name" and "content"
    link-[rel]
    Every link tag "rel" and "href"
    user-agent
    The origin user agent
    referer
    The user agent referrer (if any)
    last-modified
    The document last modified time (if any) in std format
    content-length
    The document content length, as fetched from the origin server
    Properties:
    prepend
    Prepend this string to the property names define above, that are populated by this template. (defaults to "").
    Version:
    %V% 2.2
    Author:
    Stephen Uhler
    • Constructor Detail

      • ContentTemplate

        public ContentTemplate()
    • Method Detail

      • tag_title

        public void tag_title​(RewriteContext hr)
        Toss everything up to and including this entity.
      • tag_slash_title

        public void tag_slash_title​(RewriteContext hr)
        Gather up the title - no tags allowed between title .... /title.
      • tag_script

        public void tag_script​(RewriteContext hr)
        Append all "script" code while in the head section. If the script has a "src" attribute, we'll put the "src" in a variable so the template can deal with it (them?) For now, ignore it.
      • tag_style

        public void tag_style​(RewriteContext hr)
        Append all "style" code while in the head section.
      • tag_slash_head

        public void tag_slash_head​(RewriteContext hr)
        Mark end of head section. All "script" content in the "body" is left alone.
      • tag_content

        public void tag_content​(RewriteContext hr)
        toss everything up to and including here, but turn on content accumulation.
      • tag_body

        public void tag_body​(RewriteContext hr)
        Grab the "body" attributes, and toss all output to this point.
      • tag_slash_content

        public void tag_slash_content​(RewriteContext hr)
        Save the content gathered so far, and turn off content accumulation.
      • tag_slash_body

        public void tag_slash_body​(RewriteContext hr)
        If no content tags are present, use the entire "body" instead.
      • tag_meta

        public void tag_meta​(RewriteContext hr)
        Extract data out of meta tags into the properties. For "http-equiv" tags, set the corrosponding http respones header.
      • tag_link

        public void tag_link​(RewriteContext hr)
        Extract data out of link tags into the properties. Prefix the "rel" attribute with "link-" to use as the property name.