I’m working on an academic project where I need to build a code converter between different programming languages.
The first step is to create a syntax reader to identify structures, variables, functions, types, etc.
My question is: what do you think is the most efficient approach for this kind of parsing?
- Regex – fast for simple patterns but can become messy with more complex syntax.
- Lexer + Parser – reading character by character, generating tokens, and then interpreting them via syntax analysis.
- AST (Abstract Syntax Tree) – building a syntax tree to make conversion easier.
- Other methods – maybe a library, technique, or algorithm you’ve used for parsing.
This is part of a university research project, so the focus is not only on making it work, but also on exploring and comparing different approaches.
What techniques have you used for this kind of task?
Would you recommend starting simple with regex, or going straight to something more structured like a lexer/tokenizer?
If you have any practical code examples, they would be very welcome...
Thanks in advance!