What is the best way to extract information from a Gmail eMail?

Sam_Samui · August 31, 2024, 3:15am

Dear community, I have no clue how to extract information from a Gmail eMail to use it in Text Blaze. Since the eMail text position always change, I can't use the website text picker. Unfortunately, I also don't understand regex very well, but I think I could fix the problem with that!? or is it better to collect the data first to Data Blaze in a table and use it from there for Text Blaze? or do I need an eMail scrapper first to get out the data and import it as a CSV-File into Data Blaze?

For my logic, I need something that check for the string, e.g. 'Lead Traveler Name:' and output the text after that string to Text Blaze.

I need the following information with an arrow from the Gmail eMail to use in Text Blaze:

TB help

What will be the best solution to create a snippet for that?

Thankfully for all suggestions!

scott · August 31, 2024, 7:04am

Hi, and welcome to the forum!

Regexes would be the best tool for that. For example, to get the Lead Traveler Name you would use something like this:

Email text (you would get this from the page directly using the site command or maybe the clipboard command):

{formparagraph: default=
text...
text...
Lead Traveler Name: John Smith
text...
text...
; name=email; rows=8
}

Lead traveler: {=extractregex(email, "Lead Traveler Name: (.*)")}

You can learn more about RegEx's here: Text Blaze | Formula Reference.

They can still be a bit tricky though. Our "AI Write" feature can help you build them.

For instance I asked AI Write to do this:

Use a regex to get the traveler name from the following text:

text...
text...
Lead Traveler Name: John Smith
text...
text...

And it returned a snippet with the correct formula for the regex.

Let me know if this helps! If you want further help extracting specific items from this email please include the text of the email (instead of a screenshot) in your response with any sensitive info redacted.

Sam_Samui · September 5, 2024, 9:25am

Dear scott, thank you very much for the example.

I got almost everything work, just the phone number and change the date format is the last problem

I can extract the line of phone number, but not only the phone number:
{=extractregex({site: text; page=https://mail.google.com/*; select=ifneeded; selector=div > :nth-child(2) > tbody > tr > td > table > tbody > :nth-child(1)}, "Phone: (.*)")}

The Result will be: (Alternate Phone)IT+39 21221219999 Send the customer a message.
How can I extract only the numbers after the '+' (included the '+')

I can also extract the date and change the format, but I can't pull the correct date format into the place I need:
Date: {=extractregex({site: text; page=https://mail.google.com/*; select=ifneeded; selector=div > :nth-child(2) > tbody > tr > td > table > tbody > :nth-child(1)}, "Travel Date: (.*)")}

The result will be: Date: Thu, Aug 22, 2024
Date: {=datetimeformat(datetimeparse("Thu, Aug 22, 2024", "ddd, MMM D, YYYY"), "DD-MM-YYYY")}
The result will be: Date: 22-08-2024 (this is the format I need)
How can I integrate the 'datetimeparse' into the exctractregex?

Thanks in advance for suggestions.

scott · September 5, 2024, 10:05am

Looks like your making great progress!

These might solve the date and phone issues for you:

Email text (you would get this from the page directly using the site command or maybe the clipboard command):

{formparagraph: default=
text...
text...
Lead Traveler Name: John Smith
(Alternate Phone)IT+39 21221219999 Send the customer a message.
Date: Thu, Aug 22, 2024
text...
; name=email; rows=8
}

Lead traveler: {=extractregex(email, "Lead Traveler Name: (.*)")}
Phone Number: {=extractregex(email, "([+][0-9 ]*) ")}
Date: {=datetimeformat(datetimeparse(extractregex(email, "Date:(.*)"), "ddd, MMM D, YYYY"), "DD-MM-YYYY")}

Sam_Samui · September 5, 2024, 11:18am

Dear scott, thank you so much for your fast feedback!

The Date-Problem is solved!

The phone number still doesn't work?

{=extractregex({site: text; page=https://mail.google.com/; select=ifneeded; selector=div > :nth-child(2) > tbody > tr > td > table > tbody > :nth-child(1)}, "Phone: ([+][0-9]) ")}

What I do wrong?

scott · September 5, 2024, 11:51am

You'll need to share the text you are trying to match against.

I noticed you changed the regex I sent you though for the number. Did the one I used not work?

Sam_Samui · September 5, 2024, 2:06pm

As you can see on the picture below there is an 'Error - No match found' with your formula: {=extractregex({site: text; page=https://mail.google.com/; select=ifneeded; selector=div > :nth-child(2) > tbody > tr > td > table > tbody > :nth-child(1)}, "Phone: ([+][0-9]) ")}

When I use the first formula, he will extract the whole line after 'Phone:' from the eMail: {=extractregex({site: text; page=https://mail.google.com/; select=ifneeded; selector=div > :nth-child(2) > tbody > tr > td > table > tbody > :nth-child(1)}, "Phone: (.)")}

but I just need the phone number, like this "+39 21221219999"

This is the original text from the eMail:
"Phone: (Alternate Phone)IT+39 21221219999 Send the customer a message."

I think it could be because there is no space before the '+' symbol?

scott · September 5, 2024, 2:41pm

The regex I sent you above for the phone number was

"([+][0-9 ]*) "

What you are sending me here uses:

"Phone: ([+][0-9]) "

Does my version work for you?

Note on further reflection I might improve my version to:

"([+][0-9 ]{5,}) "

Which says the phone number must have at least 5 spaces or numbers (you can adjust this number to what you think is suitable).

Sam_Samui · September 6, 2024, 7:36am

I'm terribly sorry for my mistake, scott.

your solution for the phone also work like a charm!

I should not work after work, was to late already and my brain was over the ocean

Finally, everything work like a charm - case closed!

THANK YOU VERY MUCH for all your quick help, you are the best!!!

scott · September 6, 2024, 10:01am

I am glad it's working!