Removing accents from string

Jade_Heilmann · April 19, 2023, 7:06pm

Hello! I am trying to write a snippet that copies Canadian cities and provinces but changes the accented letters back to their regular letters. "à" to "a" for example.

replaceregex(input,"[àáâā]","a","g")) will do it for just the a's, but how to I write a simple snippet for all of them? (a's, e's, o's, etc...)

AndyPage · April 19, 2023, 7:41pm

Hi Jade.

Maybe one of the TB team might chime in here but I don't think there is a "simple" snippet.

Something like this is the only way I know. Ir's not very elegant. Note: this is just a rough psuedo code

{replaceregex(replaceregex(replaceregex(replaceregex(replaceregex(replaceregex(replaceregex(input, "[àáâãäåāăą]", "a", "g"), "[èéêëēėęě]", "e", "g"), "[ìíîïĩīįì]", "i", "g"), "[òóôõöøōőŏ]", "o", "g"), "[ùúûüũūůűų]", "u", "g"), "[çćčĉ]", "c", "g"), "[šşś]", "s", "g"), "[ñńň]", "n", "g")}

in php you could do somethign like this

<?php

function replaceAccentedLetters($input) {
  $accentedLetters = array(
    'à', 'á', 'â', 'ã', 'ä', 'å', 'ç', 'è', 'é', 'ê', 'ë', 'ì', 'í', 'î', 'ï',
    'ñ', 'ò', 'ó', 'ô', 'õ', 'ö', 'ù', 'ú', 'û', 'ü', 'ý', 'ÿ'
  );
  $regularLetters = array(
    'a', 'a', 'a', 'a', 'a', 'a', 'c', 'e', 'e', 'e', 'e', 'i', 'i', 'i', 'i',
    'n', 'o', 'o', 'o', 'o', 'o', 'u', 'u', 'u', 'u', 'y', 'y'
  );

  foreach ($accentedLetters as $key => $value) {
    $input = str_replace($value, $regularLetters[$key], $input);
  }

  return $input;
}

?>

Here is an example of how to use the script:

$input = "Montréal";
$output = replaceAccentedLetters($input);

echo $output; // Montreal

As you can see there are no simple solutions.

Dan_Barak1 · April 19, 2023, 8:25pm

@AndyPage, thanks for the great suggestion. An approach similar to yours can work well in Text Blaze:

{letters= ["à":"a", "á":"a", "â":"a", "ã":"a", "ä":"a", "å":"a","ā":"a", "ă":"a","ą":"a", "ç":"c", "è":"e", "é":"e", "ê":"e", "ë":"e", "ì":"i", "í":"i", "î":"i", "ï":"i","ñ":"n", "ò":"o", "ó":"o", "ô":"o", "õ":"o", "ö":"o", "ù":"o", "ú":"u", "û":"u", "ü":"u", "ý":"y", "ÿ":"y"]}
{word="catàáâãÿåúăąbat"}
{=join(map(split(word,""), (l)->letters[l] if includes(keys(letters), l) else l),"")}

@Jade_Heilmann - does this work?

AndyPage · April 19, 2023, 8:50pm

That's a very clever approach @Dan_Barak1 . Will have to use a similar approach on some of my snippets.

@Jade_Heilmann Here's a breakdown of the snippet:

letters is a dictionary that maps accented characters to their non-accented equivalents.
word is the input test string with a combination of accented and non-accented characters.
The split(word,"") function splits the input string into a list of individual characters.
The map() function iterates through each character in the list created in step 3. It checks whether the character is an accented one using the includes(keys(letters), l) condition. If the character is accented, it is replaced by its non-accented counterpart using letters[l]. If the character is not accented, it remains unchanged.
The join() function combines the individual characters back into a single string, with all accented characters replaced by their non-accented counterparts.
Given the input word="catàáâãÿåúăąbat", this snippet would output the string cataaaaayauabat, with all the accented characters replaced by their non-accented equivalents.

Jade_Heilmann · April 20, 2023, 12:55am

@AndyPage, I started writing out your same rough pseudocode and thought "No way, text blaze is too good for this, there's gotta be a simpler way!" I thought there'd be a built-in string function like proper(), but I couldn't find it anywhere. (and thanks also for the detailed explanation of Dan's solution)

@Dan_Barak1 it does work, thank you. In situ it looks like this:

{letters=["à":"a", "á":"a", "â":"a", "ã":"a", "ä":"a", "å":"a","ā":"a", "ă":"a","ą":"a", "ç":"c", "è":"e", "é":"e", "ê":"e", "ë":"e", "ì":"i", "í":"i", "î":"i", "ï":"i","ñ":"n", "ò":"o", "ó":"o", "ô":"o", "õ":"o", "ö":"o", "ù":"o", "ú":"u", "û":"u", "ü":"u", "ý":"y", "ÿ":"y"]}
{=proper(join(map(split(extractregex({clipboard},"City: (.+)"),""), (l)->letters[l] if includes(keys(letters), l) else l),""))
; trim=yes}

still not super elegant but since I have to do this for multiple functions in the same snippet (city, country, name, etc...) I can drop the [letters] dictionary at the top and reference it in the appropriate function, so much cleaner in the end than the first option.

I'll add it to the feature ideas

thank you both, very helpful