[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [sc-dev] A tree-sitter code parser for SC



This is really cool, nice work!

I've also been (slowly!) writing a new SuperCollider parser, in my case for use as the frontend for Hadron, a re-implementation of sclang using LLVM to JIT. I use a lexer written using Ragel and a hand-coded parser in C++. I wonder if we could maybe chat and compare notes? Specifically:

- There are some interesting corner cases in the grammar that made for some design challenges. I'd be curious to understand how you approached solving some of these. For instance, unary negation (in combination with arbitrary binary operation) made me rewrite a medium-size chunk of my parser, particularly because I lex in an independent pass.

- What are your thoughts about LSP? I think someone on sc-dev had mentioned this to me, I had thought this might be a good way to deepen the usefulness of a sclang IDE. I hadn't heard about tree sitter before, this seems interesting as well.

- I share your intuition that better error reporting would strongly increase the usability of sclang. I wonder, could we standardize on this? I also had some ideas/hopes/dreams of someday even soliciting volunteers to help translate the error messages into other languages.

- Testing could be another area we could potentially collaborate. Besides unit test coverage my integration test plan was to build a large corpus of open source sclang example code by scraping GitHub repositories, then run my parser on it and compare the resulting parse tree against the sclang parse tree.

- I'd like to hear more about how tree-sitter-supercollider is fast. Like, what design choices did you make to support speed, and how have you measured it? Is it fast relative to the sclang parser? The sclang parser is built using Bison, and while I went a different direction with my design my suspicion is that it's pretty tough to write a hand-coded parser that will be as fast as a Bison-generated one.

I've been wanting to build a private fork of sclang instrumented with Perfetto, so traces could be automatically collected of both Hadron and sclang, for comparisons. Given that I'm working on a JIT compiler the objective of Hadron is speed. Typical speedups when moving from an interpreted bytecode model to a JIT model are more than one order of magnitude. On ARM architectures, particularly, a work project I was involved in saw a speedup on benchmarks of over 1000x when moving from a _javascript_ interpreter to JIT. Parsing, however, is typically not the slowest part of compilation, and JITing with LLVM may result in an overall *slower* compile time. I think as a syntax highlighter speed is probably really important. But hyper-optimized code is often less readable and maintainable than standard code, because of all the special cases and extra tooling involved.

The sclang parser also does some transformations to the parse tree while constructing it, to aid in an optimization tree pass that happens right after passing. I'm thinking specifically of the DropNode work, which allows the current parser to drop expressions that are dead code *before even compiling them*. So for example in the block ( 1; 2; 3; ) the first two expressions will be parsed but not compiled. This is elegant and cool, IMHO, but what it means is that the parser produced by sclang is not a *canonical* parse tree. I assume if tree-sitter-supercollider is intended for syntax highlighting then the objective is to produce a canonical tree, which makes sense. (Although, it might be quite interesting to use an editor that was syntax highlighting "dead" code!)

I also wanted to produce a canonical tree, for ease of testing and readability of the code. It had been my hope that a hand-coded parser would be a more inclusive design decision because it removes the Bison language requirement from potential contributors. Optimization steps could happen in subsequent passes on the tree, making them easier to test and hopefully easier to understand and maintain. But all of this means to me that the comparison between my parser and the sclang parser is not going to be strictly apples-to-apples. And I'm not so sure that speed matters in parsing as I suspect that the overwhelming time sink for Hadron during compilation is going to be during LLVM bytecode generation and optimization. So I was going to wait until I had a measurement of the e2e compilation before I started to optimize the parser further.

Anyway, lots I'd love to discuss, as you can tell! Do you ever dip into the Slack? Or if you're on Discord I have set up a server to talk about Scintillator and Hadron, although right now there's nobody there but me and once Josh P who very kindly dipped in. But that channel is here.

Cheers!

\L

On Wed, Jan 27, 2021 at 6:02 AM <mail@xxxxxxxxxxxxxxxxx> wrote:
Hello all

Just wanted to let you know I started work on a grammar for
SuperCollider mapping out the language for use with the tree-sitter code
parser (https://tree-sitter.github.io).

It's in an early experimental stage but most of the language has been
mapped in the grammar (phew!) so it can handle simple code examples now.
I mostly made this to be used in neovim where I do all my coding (and
have become addicted to using tree-sitter for c++ and lua projects) but
it should be possible to implement in scide aswell if interested.

There are basically three features in tree-sitter that inspired me to do
this:

- Scoped syntax highlighting - this makes it possible to easily see if a
variable is local, an environment variable, class variable, builtin or
an argument and thus makes code a heck of a lot easier to read and
understand.

- Very precise syntax error messages. Because tree-sitter structures
code in node trees, once the whole grammar is done, it should be easy to
very to get super precise syntax errors. As it is now, if you omit a ;
it tells you exactly where in the code in expected it to be (because
that one node in the tree failed to parse)

- It's fast

You can see a screenshot, some examples and follow/help with the
progress here for now:

https://github.com/madskjeldgaard/tree-sitter-supercollider

Best!


_______________________________________________
sc-dev mailing list

info (subscription, etc.): http://www.birmingham.ac.uk/facilities/ea-studios/research/supercollider/mailinglist.aspx
archive: https://listarc.bham.ac.uk/marchives/sc-dev/
search: https://listarc.bham.ac.uk/lists/sc-dev/search/