Handling Character Encoding Issues

Recently we started encountering an issue when copying strings of data from 3rd party sources and pasting them into formparagraph fields that would then insert into Salesforce. The issue is that the content copied sometimes contains characters encoded in UTF-8. For example, words like "don't" are showing up as "don‚Äôt". Phrases like "...your materials—take action..." are showing up as "...your materials‚Äîtake action...". Due to this, I started building out a command that includes several replace functions in it. The problem though, is that this will get quite long very quickly. So far, my command is as follows:

{formparagraph: name=student written comments; cols=60; rows=6}
Comments: {if: {=`student written comments`}<>""}{=replace(replace(replace(replace(replace({=`student written comments`},"â€{{^^%E2%84%A2^^}}","'"),"Äô","'"),"“","""),"”","""),"üòÉ","{{^^%F0%9F%98%84^^}}")}{else}none provided by student{endif}

Any thoughts on a better solution to this?

NOTE! Not sure why this pasted snippet is throwing an unexpected error. It works completely fine on my end.

Hey Brad,

Your snippet likely doesn't work because the forum stripped out those special characters when the snippet was posted.

To answer your question, instead of having nested replace you can add all the special characters together in a single regex, given that you always want to replace them with the same character '. The below should work. Please note that I used numbers instead of the actual characters for clarity.

The key is changing replace for replaceregex and using the regex OR, which allows you to combine multiple characters. Note as well the g for global, so that if the string runs for all occurrences.

{formparagraph: name=student written comments; cols=60; rows=6} Comments: {if: {=\`student written comments\`}<>""}{=replaceregex({=\`student written comments\`},"1|2|3","'", "g")}{else}none provided by student{endif}

Thank you @Obed_Parlapiano! That makes sense. However, I do not always want to replace with the same character. In my example snippet, there are apostrophies, double quotes, and an emoji. I have also run into scenarios where there are en dashes and em dashes. Here are more examples:

Garbled Correct Meaning
‚Äì – en dash (U+2013)
‚Äî — em dash (U+2014)
‚Äô ’ right single quote (U+2019)
‚Äò ‘ left single quote (U+2018)
‚Äù ” right double quote (U+201D)
… … ellipsis (U+2026)
– – en dash (another variation)
— — em dash (another variation)
‘ ‘ left single quote
“ “ left double quote
” ” right double quote
 stray non-breaking space
  space

Being that there are hundreds of potential character encoding issues, I was thinking perhaps there might be some way that when the user puts a string of text into the formparragraph field and presses insert, the snippet could reference out to a site that contains these characters and their conversions and bounced back any needed replacements at the point of insertion.

Hi Brad, generally, having to manually handle the character encodings should not be necessary.

I'm assuming the 3rd party source shows the characters correctly to the user. Can you let me know about the following:

  • Are the characters with invalid encoding part of your clipboard data itself? For example, if you use a Clipboard Viewer - like this one - does it show you the invalid characters, after you copy from the 3rd party source?
  • If not, then do the characters with invalid encoding first appear when you copy from the 3rd party source and paste into the formparagraph field? Or, do the characters with invalid encoding first appear inside Salesforce when you submit the form snippet?
  • Does the same issue happen when you copy from the 3rd party source and paste into any other textbox, such as, the textbox on the homepage of Text Blaze Dashboard?
  • Does the same issue happen when inserting the snippet into a website other than Salesforce?

It'd be great if you could share a video showing all these different cases. Thanks!

I did some deeper research into this and found what I believe is happening. The 3rd party I am referring to is Pendo. The data is a display of survey responses. Everything displays perfectly fine. However, when downloading it as a .csv file, many characters like quotes, apostrophies, dashes, etc, for some reason are not encoded correctly. What is odd though, is that its not ALL occurrences of these characters. I did some further research into resolution and several suggestions were to save the .csv file as a .xls file, but that did not chane anything.

I figured out that I can quickly fix the entire report by doing the following:

  1. Download the .csv file and open in.
  2. Go to the Data tab on the ribbon.
  3. In the Get & Transform Data Section, select "From Text/CSV".
  4. A file explorer opens.
  5. Locate the same .csv file and open it.
  6. Click "Load" at the bottom.
  7. A new tab is created and all data is automatically reformatted to UTF-8 which corrects the encoding errors.

Thanks for confirming Brad. It looks like this is an issue with the external software. I'll mark this thread as resolved for now, but feel free to continue the discussion.