If you were keeping track of our FDSSL project, you might have noticed that we
are moving to Rust for language implementation. We came upon this decision after
reviewing the libraries available in both Haskell and Rust, and determining that
Rust has better WASM support as well as better GUI programming support. Haskell
is a great language, and I really enjoyed writing parser logic in it. Moving
over to Rust has its challenges, namely getting back into the groove of Rust
programming, and learning the nom parser library.
With Rust being an imperative-functional programming mix, it only makes sense
that the most efficient parser libraries are parser combinators! nom is very
powerful, and I appreciate its tooling after having thrown myself at it for a
few weeks. The biggest change coming from Haskell parsec to Rust nom is the lack
of Monads. nom accomplishes the parser combinator design through functions and
generics. This means we can’t accomplish the same easy to use parser combinator
that just eats all the whitespace between tokens, but this is not a huge issue.
It just means leaving a few `space0` and `space1` around.
The way we have laid out our parser is in a functional style. We use combinators
to combine the smaller parsers, and to transform the output of the parser into
the AST of our language. This means building a parser for a token, then using
`map` to transform the parsed token into a constructor of our AST. We also use
the `verify` combinator when we want a certain behavior from the parser without
needing to create a complex construction, e.g. in a tuple, we want 2+ elements
in the tuple, but the combinator to create this behavior would be complex.
For type annotations, we used a recursive parser to enable complex types for
variables. The type parser does not parse into the AST however, as we aren’t
checking for whether a type exists at this step. We are simply parsing the
structure of the type, whether it be a base type, a tuple, or a function type.
This means we are allowing Functions as first class types!
One issue I encountered while writing the type annotation parser is that it
would often recurse infinitely, resulting in stack overflows. I have a function
`parse_type` which uses the `alt` branching parser. This attempts multiple
parsers in order to allow for different expressions. The `parse_type` function
alternates between the base type parser, the tuple type parser, and the function
type parser. The issue I was having is I ordered it as function, tuple, and base
parsers, which would result in calling the function type parser recursively as
the function type parser calls the type parser, which calls the function type
parser, and etc. We solved this issue by moving the base type parser to the top,
essentially creating a base case for the recursion in the parser.
These are just a few thoughts that I had while writing the parser. I hope that
if you read this that it helps you if you are writing your language as well!