JParsec FAQ


What is jparsec?

Jparsec is a recursive-descent parser combinator framework written for Java. The name "parsec" (parser combinator) is borrowed from the parsec library of the Haskell programming language.

Using jparsec, grammar can be written declaratively in a EBNF-like fashion. For example: Scanners.isChar('a').many() is equivalent to the EBNF notation "'a'*".


Why another parser framework?

There are excellent parser generator frameworks such as JavaCC, ANTLR already out there. Why re-inventing wheels?

Well, all those frameworks today are parser generators.

JParsec is a parser combinator framework where:


How do I use jparsec?

jparsec is very simple to use. All you have to do is to create a Parser object representing the grammar and run it using Parsers.runParser().

For example:

Parsers.runParser("aaa", Scanners.isChar('a').many(), 1, "test parser);


What is Parser?

Class Parser is the most important class in jparsec. It represents a grammar. Using the rich set of combinators provided in class Parser and Parsers, one can create sophisticated grammar by combining simpler Parser objects.
A Parser object can run at either a character level or at token level.

Scanners is the class where all the scanner objects reside.


How does jparsec handle left-recursion?

Left-recursion is hard to handle in recursive-descent parser. jparsec is no exception. You have to avoid left-recursion in your grammar. Nontheless, jparsec does provide a few combinators that handles some normal left-recursion for you. For example, a left-associative binary operator is naturally represented in a left-recursive notation in EBNF:

  ExprPlus ::= <ExprPlus> '+' <ExprMulDiv>
jparsec provides a infixl() combinator that handles left associative binary operator for you:
final Parser plus_op = Scanners.isChar('+')
  .seq(Parsers.retn(new Map2(){
    public Object map(Object o1, Object o2){
      return new Integer(((Integer)o1).intValue()+((Integer)o2).intValue());
    }
  })
); 
final Parser exprplus = Parsers.infixl(plus_op, exprmuldiv);

Does jparsec support operator precedence grammar?

Yes. jparsec has built-in mechanism for you to build operator precedence grammar for expressions.
final OperatorTable table = new OperatorTable()
  .infixl(plus_op, 10)
  .infixl(minus_op, 10)
  .infixl(mul_op, 20)
  .infixl(div_op, 20)
  .prefix(negate_op, 50);
final Parser expr = Expressions.buildExpressionParser(Lexers.integer(), table);

What is "Parsers.one()"

Parsers.one() does not consume any input and always succeed regardless of the input.


How is EBNF sequencing represented in jparsec?

The Parser.seq() function is used to represent the sequencing concept.

  A ::= B C
is represented in jparsec as:
Parser a = b.seq(c);
Sometimes the return value from the first Parser is still needed, and() can be used in this case:
Parser a = b.and(c, new Map2(){
  public Object map(Object o1, Object o2){
     ...
  }
});


How is EBNF alternation represented in jparsec?

Parser.plus() is used for alternating.

A ::= B | C
is represented in jparsec as:
Parser a = Parsers.plus(b, c);


How is kleen star (zero or more) represented in jparsec?

The many() combinator.

  A ::= B*
is represented in jparsec as:
Parser a = b.many();


How is "one-or-more" represented in jparsec?

The many1() combinator.

  A ::= B+
is represented in jparsec as:
Parser a = b.many1();


How is "N-or-more" represented in jparsec?

The many() combinator can optionally accept an int parameter that specifies the minimal number of occurrence.

  A ::= B B+
is represented in jparsec as:
Parser a = b.many(2);


How is "no more than N" represented in jparsec?

The some() combinator can be used to represent this concept.

  A ::= Epsilon | B | B B | B B B
is represented in jparsec as:
Parser a = b.some(3);


How is "N-or-more-but-no-more-than-M" represented in jparsec?

The some() combinator can optinally accept another int parameter that specifies the minimal number of occurrence.

  A ::= B B | B B B
is represented in jparsec as:
Parser a = b.some(2, 3);


How is "repeat N times" represented in jparsec?

The repeat() combinator can be used to represent this concept.

  A ::= B B B
is represented in jparsec as:
Parser a = b.repeat(3);


How is "optional" represented in jparsec?

The optional() combinator can be used to represent this concept.

  A ::= B?
is represented in jparsec as:
Parser a = b.optional();


How do I say "the string cannot be like ..."?

The not() combinator can be used to represent this concept.

Parser a = b.not();


How do I do "lookahead"?

jparsec is LL(k). That means look-ahead is sometimes useful to avoid ambiguity. The Parser.lookahead() combinator can be used to do this task.

  Parser a = Parsers.plus(b.lookahead(2), c);


What about terminals?

class Scanners provides terminal parsers that scan a CharSequence object;

class Lexers provides terminal parsers that scan a CharSequence object and return a value corresponding to the recognized character sequence.

For example: Scanners.isChar('a'), Scanners.isString("abc"), Scanners.isLineComment(), Lexers.stringLiteral(), Lexers.integer(), Lexers.decimal() etc.


What is Pattern?

Similar to a scanner, a Pattern object is also responsible for scanning a character sequence based on an expected string pattern.

It is different than a scanner in that no error reporting is available. The pattern either matches or mismatches.

A javax.util.regex.Pattern can be adapted to Pattern.

Pattern can be used to create sophisticated terminal scanner.


How is "EOF" represented in jparsec?

Parsers.eof() can be used for this purpose. It only succeeds when the end of the input is encountered.

It can be used to ensure that the entire input matches a grammar.

For example, for input "abc", Scanners.isChar('a') will succeed, but Scanners.isChar('a').seq(Parsers.eof()) will fail because it expects no input after character 'a'.


What is "Monad"?

Theoretically, "Monad" is a functor borrowed from the Category Theory.
Without understanding the mathematical background of it, today's functional programmers are using monad for various complex tasks.

See All About Monads for further understanding about monad.

Intuitively, it is useful to think of a monad as a strategy for combining computations into more complex computations.

Talking about parsec, the Parsers.retn() and Parser.bind() together constitute a Monad.


What is "combinator"?

Combinators are various functions that combine computation objects to create more complex computations.

Talking about parsec, many functions defined in Parsers, Scanners and Parser class combine Parser objects to create more complex Parser objects, therefore, are combinators.


What is "fail"?

"fail" is an optional monad operation that indicates a failure.

Parsers.fail() creates a Parser object that simply report failure regardless of the input.

Different from Parsers.zero(), fail allows you to specify the error message.


What is "retn"?

Parsers.retn() is the "return" operation of Monad.

This Parser object ignore the input and always return an object as its result.


What is "zero"?

Parsers.zero() is similar to "fail" except it does not allow specification of the error message.

"zero" and "plus" together with the monad operations constitute the "MonadPlus" in Haskell,
which, represents monad computations that may fail in one computation and try an alternative computation.


What is "plus"?

Parsers.plus() creates a Parser object that will try an 2nd Parser object when the 1st Parser object fails without any input consumption.

"plus" and "zero" together with the monad operations constitute the "MonadPlus" in Haskell,
which, represents monad computations that may fail in one computation and try an alternative computation.


What is "map"?

"map" is a combinator function that creates a Parser object that transforms the result of another Parser object.

For example, the following Parser uses "map" combinator to transform the result of parser1 from a string to integer.

Parser return_int = some_parser_returning_string.map(
  new Map(){
    public Object map(Object str_result){
      return new Integer(Integer.parseInt((String)str_result));
    }
  }
);


What is "bind"?

"bind" is the most important operation for any Monad.

It is responsible for combining subsequent computations.

For example, the following Parser uses "bind" combinator to dynamically determine the next Parser object to run:

Parser return_int = some_parser_returning_string.bind(
  new Binder(){
    public Parser bind(Object str_result){
      if(str_result instanceof String){
        return new Integer(Integer.parseInt((String)str_result));
      }
      else{
        return Parsers.fail("string result expected");
      }
    }
  }
);