grep
and Other RegEx Functions
grep
& grepl
grep
stands for " globally search for a regular expression and print all matches," just as in UNIX. The function allows you to use regular expressions to search for a pattern in a vector of strings or characters, and returns the index (indices) of the match(es).
Additionally, the function grepl
(derived from grep-logical) uses the same inputs, but returns a logical vector, where TRUE
indicates a match at that index, and FALSE
indicates the opposite.
|
sub
& gsub
Oftentimes finding the indices of matches in your text isn’t what you want — your goal is to change the text into a format that’s better for parsing. For this, we have sub
and gsub
. These functions take a regular expression and a replacement expression, applying the replacement to a string or a vector of strings.
The key difference here is that sub
applies only to the first match, while gsub
applies to all matches (derived from global-substitution).
For these functions, it’s equally valid to apply vector-wise or individually. Applying on vectors will repeat the substitution process for each individual string, so naturally a single string would work. |
Resources
RStudio Cheat Sheets: 6th sheet down, titled "String manipulation with stringr cheatsheet"
Regular expressions are hard, even for some veteran programmers, as the rules and match characters are subtly different for each programming language. This is a great resource for RegEx basics — everything on the second page is useful — and string manipulations which encompass more than that of sub
and gsub
.