start
Table of Contents

PO Filter test

The following are descriptions of the tests available in pofilter with some details about what type of errors they are useful to test for and the limitations of each test.

You can always run:

pofilter -l

to get a list of the current tests available in your installation.

If you have an idea for a new test then please help us to write it.

Test Classification

Some tests are more important than others so we have classified them to help you determine which to run first.

Test Description

accelerators

checks whether accelerators are consistent between the two strings.

Make sure you use the --mozilla, --kde, etc options so that pofilter knows which type of accelerator it is looking for. The test will pick up accelerators that are missing and ones that shouldn’t be there.

acronyms

checks that acronyms that appear are unchanged

If the acronym URL appears in the original this test will check that it appears in the translation. Translating acronyms is a language decision but many languages leave them unchanged in that case this test is useful for tracking down translations of the acronym and correcting them.

blank

checks whether a translation is totally blank

This will check to see if a translation has inadvertently been translated as blank ie as spaces. This is different from untranslated which is completely empty. This test is useful in that if something is translated as " " it will appear to most tools as if it is translated.

brackets

checks that the number of brackets in both strings match

If ([{ or }]) appear in the original this will check that the same number appear in the translation.

compendiumconflicts

checks for Gettext compendium conflicts (#-#-#-#-#)

When you use msgcat to create a PO compendium it will insert #-#-#-#-# into entries that are not consistent. If the compendium us used later in a message merge then these conflicts will appear in your translations. This test quickly extracts those for correction.

doublequoting

checks whether doublequoting is consistent between the two strings

Checks on double quotes " to ensure that you have the same number in both the original and the translated string.

doublespacing

checks for bad double-spaces by comparing to original

This will identify if you have [space][space] in when you don’t have it in the original or it appears in the original but not in your translation. Some of these are spurious and how you correct them depends on the conventions of your language.

doublewords

checks for repeated words in the translation

Words that have been repeated in a translation will be highlighted with this test. Eg “the the”, “a a”. These are generally typos that need correcting. Some languages may have valid repeated words in their structure, in that case either ignore those instances or switch this test off using the --excludefilters option.

emails

checks to see that emails are not translated

Generally you should not be translating email addresses. This check will look to see that email addresses eg info@example.com are not translated. In some cases of course you should translate the address but generally you shouldn’t.

endpunc

checks whether punctuation at the end of the strings match

This will ensure that the ending of your translation has the same punctuation as the original. Eg if it end in :[space] then so should yours. It is useful for ensuring that you have ellipses [...] in all your translations. You may pick up some errors in the original feel free to keep your translation and notify the programmers. In some languages characters such as ? ! are always preceded by a space e.g. [space]? - do what your language customs dictate. Other false positives you will notice is if through changes in words order you add “), etc at the end of the sentence. Do not change these your language order takes precedence.

It must be noted that if you are tempted to leave out [full-stop] or [colon] or add [full-stop] to a sentence that often these have been done for a reason eg a list where fullstops make it look cluttered. So initially match them with the English and make changes once the program is being used.

endwhitespace

checks whether whitespace at the end of the strings matches

Operates the same as endpunc but is only concerned with whitespace.

escapes

checks whether escaping is consistent between the two strings

Checks escapes such as \n \uNNNN to ensure that if they exist in the original that you have them in the translation.

filepaths

checks that file paths have not been translated

Checks that paths such as /home/user1 have not been translated. Generally you do not translate a file paths unless it is being used as an example.

functions

checks to see that function names are not translated

Checks that function names eg. rgb() or getEntity.Name() are not translated.

isfuzzy

check if the po element has been marked fuzzy

If a message is marked fuzzy in the PO file then it is extracted. Note this is different from --fuzzy and --nofuzzy options which specify whether tests should be performed against messages marked fuzzy

isreview

check if the po element has been marked review

If you make use of the non-Gettext ‘review’ flag:

#, review[ - reason for review]

And now

# (review) reason for review
# (pofilter) testname: explanation for translator

Note that the first method is being phased out in favour of the second because the Gettext tools do not preserve the “#, review” markers when they work on your files. Using “# (review)” ensures that the messages are not altered and will make it through the Gettext washing macine.

Then if a message is marked for review in the PO file it will be extracted. Note this is different from --review and --noreview options which specify whether tests should be performed against messages marked review.

kdecomments

checks to ensure that no KDE style comments appear in the translation

KDE style translator comments appear in PO files as “_: comment\n”. New translators often translate the comment. This test tries to identify instances where the comment has been translated.

long

checks whether a translation is much longer than the original string

musttranslatewords

checks that words configured as definitely translatable don’t appear in the translation

If for instance in your language you decide that you must translate ‘OK’ then this test will flag any occurances of ‘OK’ in the translation if it appeared in the source string. You must specify a file containing all of the must translate words using --musttranslatefile.

notranslatewords

checks that words configured as untranslatable appear in the translation too

Many brandnames should not be translated, this test allows you to easily make sure that words like: Word, Excell, Impress, Calc, etc are not translated. You must specify a file containing all of the no translate words using --notranslatefile.

numbers

checks whether numbers of various forms are consistent between the two strings

You will see some errors where you have either written the number in full or converted it to the digit in your translation. Also changes in order will trigger this error.

puncspacing

checks for bad spacing after punctuation

In the case of [full-stop][space] in the original this test checks that your translation does not remove the space. It checks also for [coma], [colon], etc

purepunc

checks that strings that are purely punctuation are not changed

This extracts strings like “+” or “-” as these usually should not be changed.

sentencecount

checks that the number of sentences in both strings match

Adds the number of fullstops to see that the sentence count is the same between the original and translated string.

short

checks whether a translation is much shorter than the original string

simplecaps

checks the capitalisation of two strings isn’t wildly different

This will pick up many false positive so don’t be a slave to it. It is useful for identifying translation that do not start with a capital when they should or those that do when they shouldn’t. It will also highlight sentences that have extra capitals, depending on the capitalisation convention of your language you might want to change these to Title Case or change them all to normal sentence case.

singlequoting

checks whether singlequoting is consistent between the two strings

The same as doublequoting but checks for the ’ character. Because this is used in words like - it’s, user’s, etc - this can cause spurious errors as your language might not use such a system. If a quote appears at the end of a sentence in the translation ie ‘[full-stop] this might not be detected properly by the check.

spellcheck

checks words that don’t pass a spell check

This test will check for for misspelled words in your translation. The test first checks for mispelled words in the English and adds those to an exclusion list. The advantage of this is that many words that are specific to the application will not raise errors eg. program names, brand names, function names, etc.

The checker wWorks with PyEnchant failing that it works with jToolkits spelling module. You will need to have either of these installed for the checker to work. Plus of course a spell checker in your language. This test will only work if you have specified the --language option.

The pofilter error that is created lists the mispelled word plus all the suggestions returned from the spell checker. So it is easy for you to identify the word and select a suggestion.

startcaps

checks that the message starts with the correct capitalisation

After stripping whitespace and common punctuation characters it then checks to see that the first remaining character is correctly capitalised. So if the sentence starts with a capital and the translation does not then an error is raised. This check does not yet consider locale information to determine what characters are considered to be capitals in your locale.

startpunc

checks whether punctuation at the beginning of the strings match

Operates as endpunc but you will probably see fewer errors.

startwhitespace

checks whether whitespace at the beginning of the strings matches

As in endwhitespace but you will see fewer errors.

unchanged

checks whether a translation is basically identical to the original string

This checks to see if the translation isn’t just a copy of the English original. Many time this is what you want but sometimes you will detect words that should have been translated.

untranslated

checks whether a string has been translated at all

This check is really only useful if you want to extract untranslated strings so that they can be translated independently of the main work.

urls

checks to see that URLs are not translated

This checks only basic URLs not all URIs. Generally you don’t want to translate URLs unless they are example URLs. If the URL is for configuration information then you need to talk to the developers about placing config information in PO files. It shouldn’t really be there unless it is very clearly marked, such information should go into a configuration file.

validchars

checks that only characters specified as valid appear in the translation

Often during character conversion to and from UTF-8 you get some strange characters appearing in your translation. This test presents a simple way to try and identify such errors.

This test will only run of you specify the --validcharsfile command line option. This file contains all the characters that are valid in your language. You must use UTF-8 encoding for the characters in the file.

If the test finds any characters not in your valid characters file then the test will print the character together with its Unicode value.

variables

checks whether variables of various forms are consistent between the two strings

This checks to make sure that variables that appear in the original also appear in the translation. Make sure you use the --kde, --openoffice, etc flags as these define what variables will be searched for. It does not at the moment cope with variables that use the reordering syntax of Gettext PO files.

xmltags

checks that XML/HTML tags have not been translated

This check finds the number of tags in the source string and checks that the same number are in the translation. If the counts don’t match then either the tag is missing or it was mistakenly translated by the translator, both of which are errors.

The check ignores tags or things that look like tags that cover the whole string eg “<Error>” but will produce false positives for things like “An <Error> occurred” as here “Error” should be translated. It also will detect a translated alt tag in eg <img src=bob.png alt=”blah”> as an error when in fact it is correct.