Source language lexer

Working on new editor enhancements?
PBJim
Enthusiast
Enthusiast
Posts: 293
Joined: Fri Jan 19, 2024 11:56 pm

Source language lexer

Post by PBJim »

I'm doing research work on source code lexers in general, or more precisely lexical analysis. I know there are some forum members who work occasionally on the IDE and editor who might be able to answer this question — does the IDE editor use a lexer (distinct from a parser) to determine syntax colouring? If so, could you possibly point me in the right direction with its placement in the Github repo? Thanks.
moricode
Enthusiast
Enthusiast
Posts: 162
Joined: Thu May 25, 2023 3:55 am

Re: Source language lexer

Post by moricode »

purebasic is not a popular language like C , or C++ , or ASM or Pascal or QuickBasic or something else .

In order to make it widely accepted or increase popularity , language should publish it's specifications and encourage alternative compiler implementation.

even the liberty basic has alternative compiler/interpreter.

any master here intend to create an alternative compatible base language compiler which comply to PB specification ?

not necessary to implement all library function , since most of them are take from open source, like sqlite, zip, png ....

this will boost the popularity
User avatar
Sicro
Enthusiast
Enthusiast
Posts: 559
Joined: Wed Jun 25, 2014 5:25 pm
Location: Germany
Contact:

Re: Source language lexer

Post by Sicro »

The PureBasic IDE uses Scintilla for the token highlighting in the editor, and it comes with a lexer:
https://www.scintilla.org/ScintillaDoc.html#Lexer
This code sets up the lexer:
https://github.com/fantaisie-software/p ... ighting.pb
This code is also used for highlighting:
https://github.com/fantaisie-software/p ... gEngine.pb
This code is used for autocomplete, variableviewer, procedurebrowser etc.:
https://github.com/fantaisie-software/p ... eParser.pb

I have written two lexers for the PureBasic programming language: Here is my own regex engine that compiles several regexes into NFA or a very fast DFA:
https://github.com/SicroAtGit/RegEx-Engine
The project focuses on ensuring that the regex engine is suitable for creating lexers. The above DFA-based PureBasic lexer also utilizes the DFA generated by this regex engine. In the code examples of the project, you will also find a simple lexer example there.
Image
Why OpenSource should have a license :: PB-CodeArchiv-Rebirth :: Pleasant-Dark (syntax color scheme) :: RegEx-Engine (compiles RegExes to NFA/DFA)
Manjaro Xfce x64 (Main system) :: Windows 10 Home (VirtualBox) :: Newest PureBasic version
wro
User
User
Posts: 14
Joined: Tue Mar 23, 2021 1:19 pm

Re: Source language lexer

Post by wro »

Yeah, most modern IDEs do use a lexer for syntax highlighting. It's usually separate from the parser since highlighting doesn't need the full understanding of the code like parsing does. For example, in some editors, you'll find a lexer that just looks for keywords, strings, comments, etc., and assigns colors based on those. As for the GitHub repo, it really depends on the IDE, but in general, look for files or modules related to syntax highlighting or tokenization, and it’s likely there.
PBJim
Enthusiast
Enthusiast
Posts: 293
Joined: Fri Jan 19, 2024 11:56 pm

Re: Source language lexer

Post by PBJim »

Sicro wrote: Fri Mar 28, 2025 6:29 pm The PureBasic IDE uses Scintilla for the token highlighting in the editor, and it comes with a lexer:
Thanks very much Sicro, that's just what I was looking for. In fact, I had been reading at least one of those sections of IDE code yesterday, looking for references to what might define it as a 'lexer' but didn't realise I was looking at it already.

Okay, I have a lot of reading up to do now. It's an interesting area, I can see.
Last edited by PBJim on Sat Mar 29, 2025 12:38 pm, edited 1 time in total.
PBJim
Enthusiast
Enthusiast
Posts: 293
Joined: Fri Jan 19, 2024 11:56 pm

Re: Source language lexer

Post by PBJim »

wro wrote: Fri Mar 28, 2025 8:36 pm Yeah, most modern IDEs do use a lexer for syntax highlighting.
Thanks for the comments Wro, interesting to read. I'm getting into this now. :D
BlameTroi
New User
New User
Posts: 5
Joined: Sat May 24, 2025 6:31 pm
Location: USA

Re: Source language lexer

Post by BlameTroi »

Tangential. Does anyone know of a TreeSitter grammar for PureBasic?
DarkDragon
Addict
Addict
Posts: 2344
Joined: Mon Jun 02, 2003 9:16 am
Location: Germany
Contact:

Re: Source language lexer

Post by DarkDragon »

JFlex & GrammarKit EBNF if you prefer this.
bye,
Daniel
Post Reply