Package translate :: Package storage :: Module html
[hide private]
[frames] | no frames]

Module html

source code

module for parsing html files for translation

Classes [hide private]
  htmlunit
A unit of translatable/localisable HTML content
  htmlfile
  POHTMLParser
Functions [hide private]
 
strip_html(text)
Strip unnecessary html from the text.
source code
 
normalize_html(text)
Remove double spaces from HTML snippets
source code
 
safe_escape(html)
Escape &, < and >
source code
Variables [hide private]
  strip_html_re = re.compile(r'(?sx)^<(?P<tag>[^\s\?>]+)(?:(?:[^...
  normalize_re = re.compile(r'\s\s+')

Imports: re, name2codepoint, HTMLParser, base, ParseError


Function Details [hide private]

strip_html(text)

source code 

Strip unnecessary html from the text.

HTML tags are deemed unnecessary if it fully encloses the translatable text, eg. '<a href="index.html">Home Page</a>'.

HTML tags that occurs within the normal flow of text will not be removed, eg. 'This is a link to the <a href="index.html">Home Page</a>.'


Variables Details [hide private]

strip_html_re

Value:
re.compile(r'(?sx)^<(?P<tag>[^\s\?>]+)(?:(?:[^>]|(?:<\?.*?\?>))*[^\?>]\
)?>(.*)</\1>$')