Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> My use case is parsing Typescript type definitions.

> Edit: unfortunately, this seems to be a JNI bridge to the Rust code in SWC, I was hoping for a JVM-native solution.

Maybe tree-sitter[0] TypeScript support[1] could do the trick?

  Tree-sitter is a parser generator tool and an incremental 
  parsing library. It can build a concrete syntax tree for a 
  source file and efficiently update the syntax tree as the 
  source file is edited.
HTH

0 - https://tree-sitter.github.io/tree-sitter/

1 - https://github.com/tree-sitter/tree-sitter-typescript



I get the sense that treesitter is more for syntax highlighting than for real parsing, since I raised this issue[0] a while ago and I don't think anyone's really interested in it (not really interested enough myself in tree-sitter to see if it can be fundamentally solved; solving it involves making almost all production rules in the grammar parametric over two booleans).

Admittedly I haven't tested the TypeScript treesitter grammar, but I'd be surprised if the issue is fixed there. I've put together a sample file[1] that demonstrates various cases of these context dependencies. If I remember correctly, Sublime's highlighter was the best at handling these cases out of various editors I tried, though it still failed at some of the ones at the bottom involving multi-line function expressions within object literal keys. GitHub/gist uses treesitter, so you'll notice that sometimes the "REGEX" and "DIVISION" blocks are inconsistently coloured, but a correct parser should associate a colour consistently to them. Not demonstrated here, but inserting a multi-line comment in a file that is parsed incorrectly will throw the entire thing off.

[0] https://github.com/tree-sitter/tree-sitter-javascript/issues...

[1] https://gist.github.com/Maxdamantus/a11b41675fcde25ffc9b7ef0...


Tree sitter has a C runtime, and often a C scanner.


If the tree-sitter runtime requirements are prohibitive and a pure JVM solution (IOW, no JNI) mandatory, then I'd recommend using antlr[0] with its representative grammar[1] as a starting point. See here[2] for more details.

0 - https://www.antlr.org/

1 - https://github.com/antlr/grammars-v4/blob/master/javascript/...

2 - https://github.com/antlr/grammars-v4/wiki


I agree that the Antlr Typescript grammar is underwhelming, and I also agree with the difficulty of not having a well defined owner of the project to interact with. My main use case is parsing TS type definitions, and the grammar from the grammars-v4 repo doesn't parse anything starting with 'declare', which is pretty fundamental.


I'm the maintainer of swc4j. I had been using Antlr for many years, but was deeply disappointed by Antlr. So I created swc4j.

Why Antlr is not the one? Hope the following blog I wrote may explain.

https://blog.caoccao.com/hello-swc4j-goodbye-antlr-f9a63e45a...


Right, that would be another native bridge solution.


See my peer comment for another option. :-)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: