When it comes to Unicode blocks there are the first few blocks that have to do with Latin characters such as Latin 1 Supplement, and Latin A Extended Unicode. These kinds of characters come up now and then for cretin words that come from languages like Spanish, and many other such latin based languages outside that of English. If for some reason I might be interested in just simply converting these kinds of strings into a string that contains just the first few ASCCI range characters I can used the lodash deburr method to make quick work of that kind of task. This method in lodash just simply takes away any additional accent over a letter and just converts into a plain English style letters form of the word.
The situation might be one where I am dealing with text that contains words like “Jalapeño”, and I would like to run that text threw a method that will spit out “Jalapeno” So with that said a basic example of the lodash deburr method would just involve calling the lodash deburr method and passing the raw text to it as a first argument, the returned result will then be the same text but with the diacritical marks removed.
The word given is in Spanish, and I just want the word to be in a form that just uses characters in the first Unicode Block
So then I will need to use the string replace prototype method and pass a pattern that will match the Latin characters as the first argument, and then pass a function for the second argument that will map the wanted ASCII chars to the locations in the source string. The process will then be a little involve, and the solution that I have made for this involved diving deep into the lodash source code actually.
So that part was simple enough when it comes to just looking into the source code of lodash at least rather than coming up with my own pattern that will do the same thing in a different way, in which case even this part might not be so simple. In any case this as I see it is the easy part, just matching the letters with a regular expression. The hard part now is how to go about replacing these letters with the characters that I want in the ASCII range.
The full deal when it comes to just raiding the lodash source code will involve some additional code in two other source files of interest. One of which is the deburrLetter file in the internal folder, and the other is the base property of file also in the internal folder. All of this can then be used together to not just preform the matching but also the process of replacement.