Skip to content

Getting started

David Hagen gave an introductory talk on Parsita at SciPy 2021. The documentation is more thorough, but the talk is useful for anyone looking for a quick walk-through.

Defining a parser

Parsita parsers are written inside the bodies of classes that inherit from TextParsers. (If you are parsing something other than a string, such a stream of tokens, there is the GeneralParsers class.) Such classes are not meant to be instantiated. They are used purely as a kind of souped-up context manager.

The metaclass of TextParsers takes a whitespace named argument, which is a regular expression, or more commonly, a string that can be parsed into a regular expression. This becomes the definition of whitespace for all Parsers defined in this class body, making it so that whitespace as defined does not need to be explicitly captured and discarded. The default value is r'\s*'—all whitespace including newlines. With whitespace=None, no whitespace is ignored.

from parsita import *

class NumericListParsers(TextParsers, whitespace=r'[ ]*'):
    integer_list = '[' >> repsep(reg('(+-)?[0-9]+') > int, ',') << ']'

Pretty much every function in Parsita returns an object with type Parser. Various operators, like & and |, are defined so that Parsers can be combined to make new, more complex, parsers, hence the name "parser combinators".

Invoking a parser

Parser.parse

The only method of note on a Parser is the parse method. The parse method takes a str as input and returns an instance of the Result class, which has two subclasses Success and Failure. The standard way to test if a result is a Success or Failure is to use isinstance(result, Success). If Success, the parsed value can be obtained with result.value. If Failure, the error message can be obtained with result.message.

from parsita import *

class NumericListParsers(TextParsers, whitespace=r'[ ]*'):
    integer_list = '[' >> repsep(reg('(+-)?[0-9]+') > int, ',') << ']'

result = NumericListParsers.integer_list.parse('[1, 1, 2, 3, 5]')

if isinstance(result, Success):
    python_list = result.value
else:
    raise RuntimeError(result.message)

Result.or_die

Alternatively, result.or_die() returns the value if it is a Success and raises a ParseError exception with the message if it is a Failure. It is common to apply this immediately after the call to parse when an exception on failure is desired.

from parsita import *

class NumericListParsers(TextParsers, whitespace=r'[ ]*'):
    integer_list = '[' >> repsep(reg('(+-)?[0-9]+') > int, ',') << ']'

python_list = NumericListParsers.integer_list.parse('[1, 1, 2, 3, 5]').or_die()