Regex Assistant

I would love for Text Blaze to have a regex assistant that functions similarly to the new Site Selector tool. There are a lot of online tools and resources for learning regex, but it still is quite overwhelming. Is this something feasible?

Hey Brad, thanks for the nice suggestion!

Let's take a step back and think about this from the ground-up as a "Text Matching Assistant". Most people don't know what a regex is. But their most common use case (probably) is: "Hey, I have a bunch of text which all follow a structural pattern, can you provide me a way to extract specific info from this structure?"

I think there is definitely some value in creating such a match assistant. The key challenges in developing this are:

  1. how to surface this functionality naturally to users? It's clearly not a regular command, it's more like a wizard/setup helper. We need a way to surface this feature when user needs it and otherwise keep it collapsed.
  2. how to prompt the user to provide as many text examples as possible? The more examples we have, the less error-prone our generated pattern will be
  3. Generating the actual regex is itself not an easy task :stuck_out_tongue: The generator's key characteristic is that its output should 1. favor simple regexes over complicated ones, and 2. overfit on the user's provided examples instead of underfitting/generalizing.
  4. Finally, there might be weird character encoding issues in non-English languages. The generator must take care of them.
  5. (This one's a long shot but...) In case user's match assistant does not work during snippet insertion, we can provide them a button to "refine the match" wherein they can provide the new sample text and the regex will be regenerated to accommodate it.

Overall, these are the ones off the top of my head. It seems to be possible to do at least part of this because there are some free tools available online. Part of the reason for me laying out this whole challenge list is to also make the audience realise that this is a multi-stage problem and it will take a long time before it gets implemented :sweat_smile:

1 Like

@Brad_Hedinger - on top of what Gaurang mentioned, there's a key point to keep in mind—regex is a very powerful and flexible tool, but also a very challenging one.

Even seasoned developers will sometimes have a hard time coming up with a regex pattern that works in all required scenarios while ALSO being lean.

Generating regex patterns might be relatively simple, but I think that generating a regex pattern that balances those two elements I mentioned in the best possible way would be very hard.

1 Like

While what you said is correct and I agree with you, I'll add a contrasting perspective: generating a site selector can be very tricky in certain cases even for a seasoned developer. Yet we did it anyway, and it works well in many of the cases :wink:

I'm hopeful we can come up with a decent enough solution to cover common use cases here as well. It's just that this project requires continuous bandwidth over several weeks, which is hard to find right now :frowning:

1 Like

Thank you @Gaurang_Tandon and @Cedric_Debono_Blaze! Everything you say makes complete sense. Perhaps some inspiration lays in this link below I came across. Also, I am a fan of not recreating something that already exists...just because. So, maybe it's a better use of time to focus efforts on solutions that can readily serve the most TB users. In the meantime, here's that tool I came across that helps build regex:

Retool | RegEx Generator

1 Like


Here's a little snippet I use to test how my regex will work.

Regex pattern: {formtext: name=regex}
Content: {formparagraph: name=content}

Output: {=extractregex(content, regex)}