[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [sc-dev] A tree-sitter code parser for SC



I actually just added basic support for unary (negative) operators as well as some other basic stuff.

On 31/01/2021 16.00, mail@xxxxxxxxxxxxxxxxx wrote:




On 29/01/2021 16.49, luke.nihlen@xxxxxxxxx wrote:
This is really cool, nice work!

I've also been (slowly!) writing a new SuperCollider parser, in my case for use as the frontend for Hadron, a re-implementation of sclang using LLVM to JIT. I use a lexer written using Ragel and a hand-coded parser in C++. I wonder if we could maybe chat and compare notes? Specifically:

Sure!

- There are some interesting corner cases in the grammar that made for some design challenges. I'd be curious to understand how you approached solving some of these. For instance, unary negation (in combination with arbitrary binary operation) made me rewrite a medium-size chunk of my parser, particularly because I lex in an independent pass.

I haven't dealt much with unary negatives yet but tree-sitter has built in support for this by using left-side or right-side precedence which has helped me solve a lot of fuzzy problems.

check this out:

https://tree-sitter.github.io/tree-sitter/creating-parsers#using-precedence


I currently parse unary method calls like 1.round as <literal><instance_method> (compared to SinOsc.ar which becomes something like <class><class_method>) See the examples here if oyu want to se how granular this can become https://github.com/madskjeldgaard/tree-sitter-supercollider#parsing-examples



- What are your thoughts about LSP? I think someone on sc-dev had mentioned this to me, I had thought this might be a good way to deepen the usefulness of a sclang IDE. I hadn't heard about tree sitter before, this seems interesting as well.

Yes my dream is to reach the LSP-stage at some point too. I use a combination of LSP server (clang) and tree-sitter when writing c++ for example and it makes for a very comfortable and helpful toolset I think.


- I share your intuition that better error reporting would strongly increase the usability of sclang. I wonder, could we standardize on this? I also had some ideas/hopes/dreams of someday even soliciting volunteers to help translate the error messages into other languages.

- Testing could be another area we could potentially collaborate. Besides unit test coverage my integration test plan was to build a large corpus of open source sclang example code by scraping GitHub repositories, then run my parser on it and compare the resulting parse tree against the sclang parse tree.

yeah! Tree-sitter has unit testing built in, and I try to write at least one test for each rule I add. The tests in tree-sitter must result in exactly the parsed tree that the test expects to there is no tolerance there, which is good cause it picks up on the smallest changes.


- I'd like to hear more about how tree-sitter-supercollider is fast. Like, what design choices did you make to support speed, and how have you measured it? Is it fast relative to the sclang parser? The sclang parser is built using Bison, and while I went a different direction with my design my suspicion is that it's pretty tough to write a hand-coded parser that will be as fast as a Bison-generated one.

Yeah this is probably true. I actually haven't done any benchmarking, this is just me buying in to the marketing hype :)

But tree-sitter parses each piece of code as a node, so when it has to reparse something, instead of reparsing the whol document it only reparses that particular node (for example the right side of a binary _expression_ in sc or something like that).


I've been wanting to build a private fork of sclang instrumented with Perfetto, so traces could be automatically collected of both Hadron and sclang, for comparisons. Given that I'm working on a JIT compiler the objective of Hadron is speed. Typical speedups when moving from an interpreted bytecode model to a JIT model are more than one order of magnitude. On ARM architectures, particularly, a work project I was involved in saw a speedup on benchmarks of over 1000x when moving from a _javascript_ interpreter to JIT. Parsing, however, is typically not the slowest part of compilation, and JITing with LLVM may result in an overall *slower* compile time. I think as a syntax highlighter speed is probably really important. But hyper-optimized code is often less readable and maintainable than standard code, because of all the special cases and extra tooling involved.

The sclang parser also does some transformations to the parse tree while constructing it, to aid in an optimization tree pass that happens right after passing. I'm thinking specifically of the DropNode work, which allows the current parser to drop expressions that are dead code *before even compiling them*. So for example in the block ( 1; 2; 3; ) the first two expressions will be parsed but not compiled. This is elegant and cool, IMHO, but what it means is that the parser produced by sclang is not a *canonical* parse tree. I assume if tree-sitter-supercollider is intended for syntax highlighting then the objective is to produce a canonical tree, which makes sense. (Although, it might be quite interesting to use an editor that was syntax highlighting "dead" code!)

Yeah this is actually an interesting problem! Should the parser be opinionated enough to say hey, those two first statements in (1;2;3;) actually don't do anything? I am also apropos considering whether the syntax highlighter should somehow emphasize that the last of those three is actually a return statement equal to ^3. I think that would be helpful for newbies especially to visually see what gets returned.


I also wanted to produce a canonical tree, for ease of testing and readability of the code. It had been my hope that a hand-coded parser would be a more inclusive design decision because it removes the Bison language requirement from potential contributors. Optimization steps could happen in subsequent passes on the tree, making them easier to test and hopefully easier to understand and maintain. But all of this means to me that the comparison between my parser and the sclang parser is not going to be strictly apples-to-apples. And I'm not so sure that speed matters in parsing as I suspect that the overwhelming time sink for Hadron during compilation is going to be during LLVM bytecode generation and optimization. So I was going to wait until I had a measurement of the e2e compilation before I started to optimize the parser further.

Anyway, lots I'd love to discuss, as you can tell! Do you ever dip into the Slack? Or if you're on Discord I have set up a server to talk about Scintillator and Hadron, although right now there's nobody there but me and once Josh P who very kindly dipped in. But that channel is here.

Cheers!

\L

On Wed, Jan 27, 2021 at 6:02 AM <mail@xxxxxxxxxxxxxxxxx> wrote:
Hello all

Just wanted to let you know I started work on a grammar for
SuperCollider mapping out the language for use with the tree-sitter code
parser (https://tree-sitter.github.io).

It's in an early experimental stage but most of the language has been
mapped in the grammar (phew!) so it can handle simple code examples now.
I mostly made this to be used in neovim where I do all my coding (and
have become addicted to using tree-sitter for c++ and lua projects) but
it should be possible to implement in scide aswell if interested.

There are basically three features in tree-sitter that inspired me to do
this:

- Scoped syntax highlighting - this makes it possible to easily see if a
variable is local, an environment variable, class variable, builtin or
an argument and thus makes code a heck of a lot easier to read and
understand.

- Very precise syntax error messages. Because tree-sitter structures
code in node trees, once the whole grammar is done, it should be easy to
very to get super precise syntax errors. As it is now, if you omit a ;
it tells you exactly where in the code in expected it to be (because
that one node in the tree failed to parse)

- It's fast

You can see a screenshot, some examples and follow/help with the
progress here for now:

https://github.com/madskjeldgaard/tree-sitter-supercollider

Best!


_______________________________________________
sc-dev mailing list

info (subscription, etc.): http://www.birmingham.ac.uk/facilities/ea-studios/research/supercollider/mailinglist.aspx
archive: https://listarc.bham.ac.uk/marchives/sc-dev/
search: https://listarc.bham.ac.uk/lists/sc-dev/search/