
Filters
        Filters and the Unix philosophy
        grep
        Regular
expressions
        egrep regular expressions
        Fun?
with regular expressions
        Other
filters
        sed
Reading:
The Unix Programming Environment, Chapter 4 
 - Unix has a philosophy of
     using small programs that have a specific purpose 
 
 - These programs are then
     combined to produce the result you want 
 
 - By giving you a set of
     "building blocks," Unix lets you handle just about any situation
     
 
 - Many of these "building
     blocks" are "filters"
 
 
  - They take some input,
      do something to it, and produce some output
 
 
 - We'll cover a few of these in
     this section
 
 
grep
 - Generally speaking, grep searches
     for patterns in files
 
 
  - Or in stdin, if no files are given
 
 
 - The patterns are a class of
     patterns called regular expressions 
 
 
  - grep stands for “get
      regular expression and print”
 
 
 - Variants of grep,
     called egrep
     and fgrep,
     are also usually available as grep -E and grep -F
 
 
  - egrep extends the regular
      expression syntax 
 
  - fgrep does a "fast"
      search using fixed strings
 
 
 - Some of the most useful
     options:
 
 
  - grep –v prints lines that do
      not match the pattern
 
  - grep –i is
      case-insensitive
 
  - grep –n prints out the line
      number before the line (and file if more than one file searched)
 
  - grep –f filename reads the
      patterns from a file (maybe only for fgrep and egrep on some systems)
 
  - grep –l only prints out the
      filenames that have something that matches (very useful on command lines:
      sort `grep –l …` | …
 
 
 
 - Regular expressions are
     basically mini-algorithms that specify how to match text
 
 
  - Regular expressions
      look similar to shell patterns, but are quite a bit different
 
 
 - The simplest regular expresson is a single letter, which matches that
     letter
 
 
  - a matches a, abcde, or supercalifragilisticexpialidocious
 
 
 - A sequence of letters matches
     that sequence
 
 
  - cat matches cat, caterpillar, or scatalogical
 
 
 - The character . (a dot) matches any character 
 
 - The character * indicates zero or more occurrences
     of the preceeding character
 
 
  - car* matches cat, carry, or carolina
      
 
  - ar*a matches sarah, saab, or marrrrrrrrrrrrra, but not marrrrrrrrrtha
 
 
 - ^ matches the beginning of a line 
 
 - $ matches the end of a line
 
 
  - So ^$ matches a blank line
 
 
 - [....] matches any of the characters given, and ranges
     can be specified
 
 
  - [0-9] matches any digit 
 
  - [0-9]* matches zero or more digits
 
 
 - [^....] matches any character other than those listed,
     and ranges can be specified
 
 
  - [^0-9] matches any non-digit
 
 
 - Note that * doesn't match anything itself. It
     just modifies the meaning of the previous character
 
 
egrep
regular expressions
 - egrep
     (or grep -E) adds a few more
 
 
  - The character + matches one or more of the
      previous character
 
 
 
  
   - car+ matches car, carr, or carrrrr, but not ca
 
  
 
 
  - The character ? matches zero or one of the
      previous character
 
 
 
  
   - car?pet  matches capet and carpet, but not ca or carrpet  
 
  
 
 - (expression1|expression2) matches either expression1
     or expression2 
 
 - Note that ?, and + don't match anything themselves. They just modify
     the meaning of the previous character
 
 
 - The book offers a couple of
     interesting regular expressions. If you understand them, you could be
     considered to have a good understanding of regular expressions. 
 
 - ^[^aeiou]*a[^aeiou]*e[^aeiou]*i[^aeiou]*o[^aeiou]*u[^aeiou]*$ 
 
 - ^a?b?c?d?e?f?g?h?i?j?k?l?m?n?o?p?q?r?s?t?u?v?w?x?y?z?$
     
 
 - The book offers a
     "thought exercise" in Exercise 4-2 on p. 105:
 
 
  - How would things be
      different if grep could match newlines? 
 
  - (Perl makes this
      possible.)
 
 
 
GNU Grep 3.0 Goodies
GNU Grep 3.0 addes character classes and other goodies that can save you
time if you do a lot of grepping. Link.  (For now, focus on the baseline grep stuff above: I will announce what this is testable
later.)
Other
filters
 
  
   - Case sensitive 
 
   - Case insensitive
 
  
 
 
  - Can sort numerically 
 
  - Can sort ascending or
      descending 
 
  - Can sort based on part
      of the line
 
 
 
  - Note the spelling! 
 
  - Discards duplicate
      lines 
 
  - Can include a count of
      the number of times each line appears 
 
  - Can print only the
      duplicated lines, or only the unique lines
 
 
 
  - I've actually never
      used this one 
 
  - diff and cmp are more commonly used, and more useful, I think
 
 
 
  - Translates one set of
      characters into another 
 
  - Can use ranges, just
      like character classes in regular expressions 
 
  - Examples
 
 
 
  
   
    - Make something 31337
        ("eleet")
 
   
  
 
 
  - Copies bits from one
      place to another 
 
  - Can do various
      transformations on the data (ASCII ß
      à
      EBCDIC)
 
 
 
  - cat $* | 
      tr -sc A-Za-z '\012' | 
      sort | 
      uniq -c | 
      sort -n | 
      more  
 
 
sed
 - sed
     is a version of ed that's designed to be used as
     a filter 
 
 - While ed
     is no longer useful, sed is still quite useful
 
 
  - sed
      does not alter any named files; the modified version is printed on stdout
 
 
 
  
   - So, how do you edit a
       file with sed? 
 
   - Usually with
       something like:
 
  
 
 
  
   
    - sed [commands] filename >filename.new
        
        mv filename filename.old
        
        mv filename.new
        filename  
   
  
 
 
  - By far, the most common
      usage of sed is to replace one thing with
      another
 
 
 
  
   - sed 's/foo/bar/g' replaces all
       occurrences off "foo" with "bar" 
 
   - "Foo" is a
       regular expression 
 
   - You can delete
       regular expressions by putting a null string for the replacement
 
  
 
 
  - See the text for other
      examples and note that grep turns out to be a
      special case of sed
 
 
 - The book makes a
     "newer" command with sed, which is of
     interest for how they do the quoting, but the find command does a much
     easier version of "newer"