Regex and grep to find and count unknown pattern

Zita_Daniel_Nad · February 10, 2023, 11:53am

I need to come up with a more elegant solution than my current.

Goal is to find all logs which have a repeated pattern and list the pattern and how many times it is repeated in a log. Problem is in order to use regex I need to know the pattern format and I do not. Is there any workaround around this?

Currently I use TB to count all errors, then copy part of error into that TB to count how many times this specific one occurs. I keep track manually of all these and then use -v to see how many are left. Great to have TB and speed up the process but I am wondering if this may be achievable in a better way.

Downloading the log files is not an option, so I depend on bash.
Using grep, I am looking for " E" followed by something indicating a pattern. This pattern can occur in several places. When looking manually I would search for anything between 1 and 4 words, very rarely more, and count occurrence to establish what is the most common error in the log or logs. I have to read the logs and identify the pattern manually. The using grep -v i remove all the atrings I have already searched for to see how many matching are still left, and repeat.
In the end I have a list of all occurrences with how many times they occur in a specific log file. Would be fabulous to have all this automated!

NB
"E" is usually preceded by a space, after the ":" the first few words are important. How do I compare one unknown pattern to the rest of the log in order to determine how many times it occurs?

Example 1
E IzatSvc_PassiveLocListener: E/Exiting with error virtual void izat_manager::IzatPassiveLocationListener
In above example all words are useful

Example 2
E/Cobra.HttpBuilder( 2404): java.net.ConnectException: Failed to connect
In above example error code is useful to check for this specific one, but then I would search omitting the numerals and everything following them it to see if there are other similar error codes.

Example 3
E ExoPlayerImplInternal: Playback error

Example 4
E Cobra.HttpBuilder: at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117)

Gaurang_Tandon · February 10, 2023, 12:34pm

So this is a tough question ... IIUC, you seem to be looking for a similarity-based grouping, so that all the errors that are similar are grouped together. Then you would have a few distinct groups, and then you would count number of entries in each group.

I have still not understood fully from your examples, so I will ask: what is the similarity metric? In other words, if I gave you two error log lines lineA and lineB, can you write a formula (or describe in words) to deterministically declare if they are similar or not?

So, if you can do that, then we can write a Text Blaze function like so:

{is_similar=(lineA,lineB)->true if |your answer| else false}

and then we can try to count and group the similar log lines.

I hope that makes sense. Though I'm not sure if I fully understood your context haha.

Zita_Daniel_Nad · February 10, 2023, 3:57pm

I cannot precisely define the similarity metric, that is the problem.

Let's try this
The line will always start with " E" or " E/"
Return any words before ":" but exclude all "(" AND all ")" AND all "\d", then compare the next 4 words and count how many times each appears in a file.

Is this possible?

Gaurang_Tandon · February 10, 2023, 4:07pm

Sure, let's try:

{errors="E IzatSvc_PassiveLocListener: E/Exiting with error virtual void izat_manager::IzatPassiveLocationListener
E/Cobra.HttpBuilder( 2404): java.net.ConnectException: Failed to connect
E ExoPlayerImplInternal: Playback error
E Cobra.HttpBuilder: at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117)"}
Unfiltered parts: {parts=extractregexall(errors, "E/?(.+?):")}{=parts}
Filtered parts: {parts2=map(parts, x -> replaceregex(x, "[\d()]", "", "g"))}{=parts2}

So now the similarity function would be:

Are index 1 and index 2 equal? {similarity_func=(a, b) -> parts2[a] == parts2[b]}{=similarity_func(1, 2)}

Zita_Daniel_Nad · February 16, 2023, 8:59am

Thank you for that.

Is it possible to have the formula modified to have it compare strings starting with " E" and then the next 7 words (and not return error if there are less than 7 words)?

Gaurang_Tandon · February 16, 2023, 10:00am

Sure. Can you give a few example strings to explain what you mean?