Extracting data with found matches as a list

cyberdave · October 30, 2020, 5:17am

Hi all, I am new to text blaze and am not a programmer, so if this is a super simple thing, my apology in advance.

I am trying to use extractregex to go through some site.html to find multiple occurrences of text and then show only the lines where the text was matched.

For example, I have something like this:

USER: I have trouble with something <br>
<br>
BOT: Which of the following best describes your question? <br>
BOT: quick_replies message <br>
<br>
USER: I am having a problem <br>
<br>
BOT: Ok I can help with that

and I am trying to capture the text between USER: and   so that in my example, the output would be something like:

I have trouble with something
I am having a problem

I can find the matches in regex, but am unable to figure out how to use extractregex to put all the matching lines together as the result instead of just the match.

If anyone have a pointer or tip on this, or just to let me know to go read the manual I would be very grateful.

Cedric_Debono_Blaze · October 30, 2020, 7:32am

Hi @cyberdave

Here's a small snippet to get you started:

Start point: {formtext: name=beginning} <<< This is the string of text where you want the snippet to START extracting content
End point: {formtext: name=end} <<< This is the string of text where you want the snippet to STOP extracting content
{section=extractregex({clipboard}, beginning & "([\D \S]+)" & end)}
{=section}

The snippet opens up a popup with a formtext to input the start string and end string.

If the strings are always going to be the same, you can do the following:

Start point: {beginning="I have trouble"} <<< This is the string of text where you want the snippet to START extracting content
End point: {end="I am having a problem"} <<< This is the string of text where you want the snippet to STOP extracting content
{section=extractregex({clipboard}, (beginning & "[\D \S]+" & end))}
{=section}

@scott is much better at this than I am, so I'm sure he can give you a better solution

scott · October 30, 2020, 8:52am

You can do something like the following. It uses splitregex() to create a list of the matches and the non-matches and then takes every event item.

What we really need to do to make this more elegant is to extend extractregex() to allow it to return multiple matches.

{msgs="USER: I have trouble with something 
 
BOT: Which of the following best describes your question? 
BOT: quick_replies message 
 
USER: I am having a problem 
 
BOT: Ok I can help with that"}

{=join(filter(splitregex(msgs, "USER:(.*) "), (item, index) -> iseven(index)), "\n")}