Skip to content

Latest commit

 

History

History
93 lines (67 loc) · 2.93 KB

regex.asc

File metadata and controls

93 lines (67 loc) · 2.93 KB

regex

ext_match

int ext_match(char *text, const char *re, void(*p_match)(int number, char *pos,int len), int(*p_match_char)(int number, char *match_char), regex_match *st_match)

match

int match(char *text, const char *re, text_match *st_match)

text matching engine

 little bit simpler version than match_ext.
 Consciusly named 'text matching', since the inherent logic
 is quite different to a regular expression machine.

 The engine matches nongreedy straight from left to right,
 so backtracking is minimized.
 It is a compromise between performance, size
 and capabilities.


 matches:

 * for every count of any char (nongreedy(!))
 + for 1 or more chars
 % for 1 or more chars, and fills in arg 3 (text_match)
 ? for 1 char
 @ matches the beginning of the text or endofline (\n)
   -> beginning of a line
 # for space, endofline, \t, \n, \f, \r, \v  or end of text (0)
 $ match end of text
 backslash: escape *,?,%,!,+,#,$ and backslash itself.
 ! : invert the matching of the next character or character class

 [xyz]: character classes, here x,y or z
   the characters are matched literally, also \,*,?,+,..
   it is not possible to match the closing bracket (])
   within a character class


 % : matches like a '+', and fills in argument 3,
 the text_match struct, when the pointer is non null.
 The matching is 'nongreedy'.


 returns: 1 on match, 0 on no match
 ( RE_MATCH / RE_NOMATCH )

 if the pointer (argument 3) st_match is nonnull,
 the supplied struct text_match will be set to the first matching '%' location;
 if there is no match, text_match.len will be set to 0.

 The struct is defined as:
 typedef struct _text_match { char* pos; int len; } text_match;

 examples:
 "*word*"  matches "words are true" or "true words are rare"
 "word*"   matches "words are true" and not "true words are rare"
 "word"    matches none of the above two texts (!)
 "*words%" extracts with % " are true" and " are rare"
           into text_match

 "Some\ntext\nwith\nlinebreaks\n\n"
 "*@%#*" matches with % "Some"
 "*@line%#*" matches % = "breaks"
 "*text\n%"  % = "with linebreaks\n\n"


 (memo) When the regex ist defined within C/cpp source code,
 a backslash has to be defined as double backslash.

 (note) - be careful when negating a following *, or ?.
  somehow - it is logical, but seems to me I overshoot a bit,
  and tapped into a logical paradox.
  Negating EVERYTHING translates to true.
  However, since truth is negated as,... well, there's a problem.

  (I'm not kidding here. Just don't do a regex with !* or !?.,
  or you might experience the meaning of full featured.
  Maybe I should say, it's not allowed?)

  A "!+" will translate into nongreedy matching of any char, however;
  "%!+" will match with % everything but the last char;
  while "%+" matches with % only the first char.
  !+ basically sets the greedyness of the left * or % higher.