How do I match the nth instance of a string with regex?

cedricdebono · October 3, 2019, 3:34pm

{text="
chapter one
chapter two
chapter three
chapter four
chapter five
chapter six
chapter seven
chapter eight
chapter nine
chapter ten"}

{=extractregex(text, "(chapter[\D\S]+)")}

In the above snippet, I would like the regex to start from an nth instance of the word "chapter". For example, start from the 6th "chapter", giving me:

chapter six
chapter seven
chapter eight
chapter nine
chapter ten

Thanks

scott · October 3, 2019, 4:47pm

This is something that is more suitable to handle as a list.

First convert the string to a list via the split() function and then extract the relevant part of the list with the slice() function.

For example:

{text="chapter one
chapter two
chapter three
chapter four
chapter five
chapter six
chapter seven
chapter eight
chapter nine
chapter ten"}

{=slice(split(text, "\n"), 6)}

You can also join the resulting list back together into a string:

{=join(slice(split(text, "\n"), 6), "\n")}

cedricdebono · October 4, 2019, 5:10am

Ok, I should've clarified. These would be headings of a text. Between each of them, there would be paragraphs of text.

So:

{text="
chapter one
[bunch of text]

chapter two
[bunch of text]

chapter three
[bunch of text]

chapter four
[bunch of text]

chapter five
[bunch of text]

chapter six
[bunch of text]

chapter seven
[bunch of text]

chapter eight
[bunch of text]

chapter nine
[bunch of text]

chapter ten
[bunch of text]
"}
{=extractregex(text, "(chapter[\D\S]+)")}

I want to be able to extract regex from a specific chapter to another specific chapter e.g. from chapter 6 to chapter 8.

Any ideas?

scott · October 4, 2019, 8:36am

This is actually pretty ugly from a regular expression side, but it can be done.

Regular expressions support a "{n}" syntax to specify you should match the preceding expression n times.

You'll also need to use the "(?:)" syntax which defines a non-capture group that doesn't return its results from the function:

So in your example we could do something like:

{text="
chapter one
[bunch of text]

chapter two
[bunch of text]

chapter three
[bunch of text]

chapter four
[bunch of text]

chapter five
[bunch of text]

chapter six
[bunch of text]

chapter seven
[bunch of text]

chapter eight
[bunch of text]

chapter nine
[bunch of text]

chapter ten
[bunch of text]
"}

{=extractregex(text, "(?:(?:chapter[\s\S]+?){5})((?:chapter[\s\S]+?){3})(chapter|$)")}

cedricdebono · October 4, 2019, 8:59am

Great stuff. So I've made some additional changes now, to allow me to use a formtext to define at what point I want to start and end.

{text="
chapter one
[bunch of text]

chapter two
[bunch of text]

chapter three
[bunch of text]

chapter four
[bunch of text]

chapter five
[bunch of text]

chapter six
[bunch of text]

chapter seven
[bunch of text]

chapter eight
[bunch of text]

chapter nine
[bunch of text]

chapter ten
[bunch of text]
"}

Start at Chapter {formtext: name=start; default=1}
End at Chapter {formtext: name=end; default=10}

{n1=start-1 if start <> "" else 0}
{n2=end-n1 if end <> "" else 10}

{=extractregex(text, "(?:(?:chapter[\s\S]+?){"&n1&"})((?:chapter[\s\S]+?){"&n2&"})(chapter|$)")}

scott · October 4, 2019, 9:23am

nice!

cedricdebono · October 4, 2019, 9:24am

I learned from the best