REGEX - How to extract information patterns from a piece of text

Cedric_Debono_Blaze · August 16, 2021, 9:21am

Hi all,

Sometimes, you'll have a piece of text (copied into your clipboard or even extracted using {urlload}), and you want to extract a piece of information from it that matches a specific pattern, maybe a phone number in the pattern XXX-XXXX.

Text Blaze allows you to do this with the help of "Regular Expressions", a.k.a. regex.

There's a video with how to do this here:

But in this thread, I would like you to throw some examples at me so I can show you how it's done using your own use-cases.

Of course, please refrain from posting sensitive data here, this being a public forum.

So, come at me with everything you've got!

Diego_Suancha · August 21, 2021, 11:29pm

I think that this one can help a lot of people.

I personally use the Regex with Urlload to find the issuer and type of a debit card using the bin(I have to admit that @scott helped me with the first version of this snippet)

It is just a sample BIN so not worries about personal information.

{formtext: name=Bin; default=419002}

{urlload: https://iin-bin.com/bin/{=Bin}.html; done=res -> ["issuer"= extractregex(res, " Payment System</th> (\w*)</td></tr>"), "bank" = extractregex(res, " Bank Issuer</th>(.)</td>"), "type" = extractregex(res, " Card Type</th>(\w)</td></tr>")]}

Issuer: {=issuer}
Bank: {=bank}
Type: {=type}

It works on Text Blaze, I promise

Cedric_Debono_Blaze · August 23, 2021, 8:55am

Hi @Diego_Suancha - thanks for the cool example!

For the record, the reason it's not working here on the forum is that urlload and a few other commands are intentionally disabled here. But as Diego said, this will work fine if you import it into your dashboard. Additionally, you will also need to "connect" this particular domain to the folder in which you've added this snippet. If you don't do that, the urlload command will not be permitted to load any data from the site.

Diego_Suancha · August 29, 2021, 11:39pm

I spent the past few days thinking in a good challenge for this post. I came out with one that can be helpful to a lot of people.

Where I job we need to use the customer first name inside the chats sent.

This name can be extracted easily with regex combined with {site} or using CSS selectors

The issue is that since the name is selected arbitrary by the customer, sometimes it returns inappropriate or senseless names.

This is an old version of a snippet that we came with.

{if: extractregex({site:text}, "Customer Name

(.+)") == "Null [Last Name Not Specified]" OR extractregex({site:text}, "Customer Name

(.+)") == "[Last Name Not Specified]"}{name="there"}{elseif: testregex({site:text}, "Customer Name

\w*\s\w*")}{CxName=extractregex({site:text}, "Customer Name

(\w*)")}{CxLast=extractregex({site:text}, "Customer Name

\w*\s(\w*)")}{if: {=len(extractregex(CxName, "\w*"))} == 1 or {=len(extractregex(CxName, "\w*"))} == 2 or CxName == "" or {=isnumber(CxName)}}{if: CxLast == "" or {=isnumber(CxLast)} or {=len(extractregex(CxLast, "\w*"))} == 1 or {=len(extractregex(CxLast, "\w*"))} == 2}{name="there"}{elseif: testregex(CxLast, "^\D")}{name=extractregex({=proper(CxLast)}, "\D*")}{elseif: testregex(CxLast, "^\d*\D*")}{name=proper(extractregex(CxLast, "^\d*(\D*)"))}{endif}{elseif: testregex(CxName, "^\D")}{name=extractregex({=proper(CxName)}, "\D*")}{elseif: testregex(CxName, "^\d*\D*")}{name=proper(extractregex(CxName, "^\d*(\D*)"))}{endif}{else}{name="there"}{endif}{=name}

I know that @Juan_Murillo used a "wrong words" database to check if the extracted name is there first. Maybe he can share it with us, he has worked a lot more than me in this problem. But I think that it can be a good example to use the Regex function.

Cedric_Debono_Blaze · August 30, 2021, 7:43am

@Diego_Suancha, ok let's tackle this one step at a time.

Could you provide me with a list of examples where he firstname and lastname are correct/incorrect so I can test them out with the snippet and look for ways to test more efficiently?

Also, I've modified your snippet slightly by creating a formparagraph at the beginning with the name "data" and {site:text} as a default value. This will allow us to preview what is being captured and test accordingly.

{formparagraph: name=data; default={site:text}}

{if: extractregex(data, "Customer Name

(.+)") == "Null [Last Name Not Specified]" OR extractregex(data, "Customer Name

(.+)") == "[Last Name Not Specified]"}{name="there"}{elseif: testregex(data, "Customer Name

\w*\s\w*")}{CxName=extractregex(data, "Customer Name

(\w*)")}{CxLast=extractregex(data, "Customer Name

\w*\s(\w*)")}{if: {=len(extractregex(CxName, "\w*"))} == 1 or {=len(extractregex(CxName, "\w*"))} == 2 or CxName == "" or {=isnumber(CxName)}}{if: CxLast == "" or {=isnumber(CxLast)} or {=len(extractregex(CxLast, "\w*"))} == 1 or {=len(extractregex(CxLast, "\w*"))} == 2}{name="there"}{elseif: testregex(CxLast, "^\D")}{name=extractregex({=proper(CxLast)}, "\D*")}{elseif: testregex(CxLast, "^\d*\D*")}{name=proper(extractregex(CxLast, "^\d*(\D*)"))}{endif}{elseif: testregex(CxName, "^\D")}{name=extractregex({=proper(CxName)}, "\D*")}{elseif: testregex(CxName, "^\d*\D*")}{name=proper(extractregex(CxName, "^\d*(\D*)"))}{endif}{else}{name="there"}{endif}{=name}

Cedric_Debono_Blaze · August 30, 2021, 8:03am

@Diego_Suancha,

Actually, try this out:

{formparagraph: name=data; rows=5; default=Customer Name

John Doe}

{if: NOT testregex(data, "Customer Name\s{2}([^0-9\W])")}{name="there"}{else}{name=extractregex(data, "Customer Name\s{2}(\w+)")}{endif}

Hello {if: len(name) == 1}there{else}{=name}{endif}.

It seems to work as intended. If I put anything but letters in the place where the firstname should be, or the length of the firstname is 1 character, the greeting changes to "Hi there".

Try it out and let me know.

Once you're happy, all you need to do is replace the formparagraph command with the following:

{data={site:text}}

Carl_Arnold · June 2, 2023, 1:10pm

Certainly! I'm glad to help you with some examples of using regular expressions to extract information from text. Let's start with phone numbers in the pattern XXX-XXXX:

Example 1:
Suppose you have the following text: "Please contact me at 123-4567 or 987-6543."
To extract the phone numbers, you can use the regular expression \d{3}-\d{4}.

Here's how you can apply it using Python code:

import re

text = "Please contact me at 123-4567 or 987-6543."
pattern = r"\d{3}-\d{4}"
phone_numbers = re.findall(pattern, text)

print(phone_numbers)

The output will be:
['123-4567', '987-6543']

Example 2:
Let's say you have a longer text with multiple occurrences of phone numbers: "My office number is 555-1234, and my cell phone number is 999-8765. You can also reach me at 123-4567."
To extract all the phone numbers, you can use the same regular expression \d{3}-\d{4} or LinkedIn Sales Navigator features.

Using the same Python code as before:
import re

text = "My office number is 555-1234, and my cell phone number is 999-8765. You can also reach me at 123-4567."
pattern = r"\d{3}-\d{4}"
phone_numbers = re.findall(pattern, text)

print(phone_numbers)

The output will be:
['555-1234', '999-8765', '123-4567']

Feel free to provide more examples or let me know if you have any specific patterns you'd like to extract information from!