I can not say that I often find myself needing to write a lexer. I will often just use a user space module that was all ready written before hand by someone else that is a lexer, or contains a lexer such as with marked.js. However there might come a time now and then when I will want to write my own lexer, one such reason would be to develop my own language. One thing that comes to mind about custom lexers is that I might want to write a one for my own complier, interpreter, or method that applies some kind of custom domain specific language.
A lexer is an important part of lexical analysis. Say I want to work out some code that makes sense of English language, the first part of such a process would be to break the text into an array of objects where each object is for a word, or other aspect of the language such as a period. this array of objects can be thought of as an array of tokens, and each token object would contain useful data about each token such as the index value at which is appears in the text, the word itself, if it is a noun or verb, and so forth.
For a simple example of a nodes lexer I first need a language that I want to make a lexer for. For this section at least I will be making up my own simple domain specific language for a game prorotype that involves orbs that are used in the game as a way to fight monsters. In other words some kind of tower define game in which orbs are socketed into them.
I will not be getting into detail about the game really, if it even every exists. For the sake of this post all that matters is that I have some kind of language that I want to make a lexer for. So with that said and example of my orb script language might look like this.
The language consists thus far of collections of lines of code that are terminated with a simi colon. Each line starts out with a keyword followed by a property and then a value. The simi color can be used as a way to break the code down into lines, although it might be better to use line breaks. In any case there is a way to know if a certain statement is over or not. In addition white space can be used as a way to split one of these lines or statements into tokens, or a collection of what might be called a lexem.
I will want an array of objects that can be used to identify tokens in a code example string. This array of objects will contain a regular expression property that will be used to find out if a given lexum is a known keyword, operator, value or other value of the language.
I will also want a helper function that breaks down a line of code into token objects, and a main lexer method that is what will be exported by the module.
So something like this:
The result of which spits out this json to the console.
So it would seem that my simple little lexer works as expected thus far at least. My language is not much of a language at this point, but the basic idea of a nodejs lexer is there. Taking this to the next level would involve further developing the language to begin with, as well as this lexer. In addition I would want to write some kind of function or other moduel that will take this array of tokens and do something useful with it. Such as apply the orb script to some kind of orb object.