Using extractregex to remove specific characters

Peter_Monterubio · June 28, 2020, 5:55pm

Attempting to use extractregex to remove specific characters from copied text. We utilize a token modifier and would like to remove the "B$_" from a string in the clipboard that would look like "B$_Number". Just can't figure out how to identify the first three characters.

Thanks so much!

scott · June 28, 2020, 6:13pm

"$" has special meaning in regular expressions (it signifies the end of the doc), so you need to escape it by putting a slash before it.

For example:

{sample_data="B$_1234"}

Just the number: {=extractregex(sample_data, "B\$_(.+)")}

If you are certain the number part is really a number, I would use:

Just the number: {=extractregex(sample_data, "B\$_(\d+)")}

"\d" means match a digit, while "." means match any character.

scott · June 28, 2020, 6:19pm

As an aside, for something like this "extractregex" maybe not always be the best solution. It's very powerful but also complex. Text Blaze has a number of simpler methods to manipulate text like "split()" which splits a string into a list based on some delimiter.

For example:

{sample_data="B$_1234"}

Just the number: {=split(sample_data, "_")[2]}

Peter_Monterubio · June 28, 2020, 7:30pm

Thank you so much! Works perfectly. That $ was messing me up

Peter_Monterubio · December 21, 2020, 5:12pm

Hey there, I’m working on a similar problem to this but am having a little bit of an issue.

It seems that when this token is copied it often picks up a blank space at the end which breaks the url we’re inserting the result into. Do you know of a way to remove spaces if they are found on the clipboard or maybe out of the assembled url?

scott · December 21, 2020, 5:59pm

This is very solvable, but it depends a bit on what the valid characters are in a token.

For example, if only numeric characters were allowed, you could use:

{sample_data="B$_1234 "}

Just the number: {=extractregex(sample_data, "B\$_(\d+)")}

If alphanumeric characters were allowed, you could use:

{sample_data="B$_1a2B3c4 "}

Just the number: {=extractregex(sample_data, "B\$_(\w+)")}

If any characters were allowed and the string sometimes has a space at the end or not, you could try the following

{sample_data="B$_1a2!B^3c4 "}

Just the number: {=extractregex(sample_data, "B\$_(.+?)(\s+|$)")}

Peter_Monterubio · December 21, 2020, 6:36pm

Would this not work when using {clipboard} in the extractregex?

I’m using {=extractregex({clipboard}, "B$_(.+)(\s+|$)")} and I still seem to get a space when I copy one at the end.

As a follow up, along with a space, sometimes users are selecting too much and grabbing a line break. I guess I can also just try to get people to be more careful in how they are copying these tokens.

scott · December 21, 2020, 6:50pm

Add a "?" after the "+" in the first parentheses like I have in the last example above.

The "?" means match as few characters as possible. If you don't have it, the capture group can expand to include the trailing space.

You could also just try: "extractregex(trim({clipboard}), ...)" Which might be the simplest as it would remove any whitespace at the start or end.

Let me know if that works!

Peter_Monterubio · December 21, 2020, 7:13pm

That would be perfect! I’m getting an error that it’s receiving an unexpected "." though.

Am I doing this right? {=extractregex(trim({clipboard}), ...)}

scott · December 21, 2020, 7:19pm

Sorry, the "..." was just a placeholder. You want something like this:

{=extractregex(trim({clipboard}), "B\$_(.+)")}

Peter_Monterubio · May 1, 2021, 6:38pm

I return to my struggles with regex expressions…

I’m attempting to extract specific strings on a page with a couple of complexities. The strings I’m trying to display in my snippet are formatted like #TEAM_SUSP_DEVICE. Where the TEAM part varies. The trouble is that a page may have more than one note that begins with a # and the team being variable can not be added to the search criteria. Is there a way to format regex to search for something that begins with # and ends with DEVICE? Then ideally I’d want to be able to pass that resulting note hash value into an if statement to direct the snippet if the note specified is present.

scott · May 2, 2021, 8:37am

You could do something like this,

{text="... #SOME_TAG #TEAM_SUSP_DEVICE #OTHER_TAG ..."}
{if: testregex(text, "#[A-Z_]+_DEVICE")}
Found a device tag
{endif}

Peter_Monterubio · May 2, 2021, 12:32pm

Thanks so much! I was completely overcomplicating things by loading the regex into it’s own value then building my if statement off of that.

What we did end up getting to work was (SUSP_.+?)(?:\s|")") but I think what you provided will be much more reliable. Thanks again! you’re a life saver!