





        Data types
        Scalar Data
        Conversion between strings and numbers
        Numerical operators
        String operators
        Chop and chomp
        Print, printf, and sprintf
        Control Structures
        Associative Arrays (Hashes)
        Basic I/O
        Regular Expressions
        Misc. Control Structures

Data types

  • Perl has essentially three data types
    • Scalars
    • Arrays
    • Associative Arrays
  • (There are others, but we won't worry about them here.)

Scalar Data

  • A scalar is essentially anything that is a single item
  • Like with awk, whether it's a number or a string is determined by context
  • Numbers
    • Can be an integer or a floating-point number
    • You don't need to worry about how Perl stores the numbers, unless you base a decision on whether 10/3 is the same as 10*(1/3), because it isn't.
    • Numbers can be specified as literals (the number itself appears in the program) in any of the following formats:
        -40        # the temperature where farenheit and celsius match
        3.8e22     # 3.8 times 10 to the 22nd power
        5.8735e-43 # a very tiny number!
    • One warning: Don't start a number with a zero, like "041", because, like C, Perl interprets that to mean an octal number, which in this case would be 33.
    • "0x" indicates hexadecimal numbers. (0x21 is 33)
  • Strings
    • Any number of ASCII characters (up to the limits of your computer's memory).
    • "NUL" is not special, like it is in C.
    • You don't need to allocate memory for your string, like you do in C.
    • Literal strings can be "single-quoted strings", "double-quoted strings," or "here documents" (which we'll cover later).
    • Single-quoted and double-quoted strings work just like we've seen before
    • "Here documents" are a form of double-quoted string
    • Special characters in double-quoted strings
        \n      # Newline
        \t      # Tab
        \\      # Backslash
        \"      # Double quote
        \r      # Carriage Return
        \f      # Formfeed
        \b      # Backspace
        \a      # Bell
        \e      # Escape
        \0nn    # An octal value
        \xnn    # A hexadecimal value
        \cC     # A control character (control-C here)
        \l      # Make next letter lower-case
        \L      # Make everything lower-case until \E
        \u      # Make next letter upper-case
        \U      # Make everything upper-case until \E
        \Q      # Backslash-quote all nonalphanumeric characters until \E
        \E      # Terminate \L, \U, or \Q
  • Scalar variables
    • A scalar variable is specified by a dollar sign ($), followed by a letter, followed (possibly) by more letters, digits, or underscores.
    • Limit is 256 characters.
    • Case is significant. ($linelength is a different variable than $lineLength)
    • Before a variable has anything assigned to it (and in a few other instances) it has the value undef.
      • If used as a number undef is a 0; if used as a string, it is "". But it really is a distinct value.
  • Assignment
    • Like in C, assignment is indicated with an equal sign (=).
    • Ex: $total = $score1 + $score2 + $score3
    • The value of an assignment is the value assigned, so you can use an assignment in an expression
    • Examples: $b = 4 + ($a = 3) or (possibly more useful) $a = $b = $c = 3

Conversion between strings and numbers

  • Perl automatically converts between strings and numbers, as needed.
  • In string-to-number conversion, any numbers in the string will be used.
  • If no numbers are in the string, its numeric value is 0.
  • Numbers are converted to strings just as they would be in a "print" statement.
  • If you use the "-w" flag with Perl, (i.e. put "#!/usr/local/bin/perl -w" at the start of your script), Perl will warn you about "weird" conversions
  • By the way, it's a really good idea to use the "-w" flag, it can catch a lot of problems.

Numerical operators

  • Basic
    • + (Addition)
    • - (Subtraction)
    • * (Multiplication)
    • / (Division)
    • ** (Exponentiation. e.g. 2**3 == 8)
    • % (Modulus. e.g. 10%3 == 1)
  • Numerical comparison operators
    • == (equal)
    • != (Not equal)
    • <= (Less than or equal)
    • < (Less than)
    • >= (Greater than or equal)
    • > (Greater than)

String operators

  • Basic
    • . (Concatenation, e.g. "hello" . " " . "world" eq "hello world")
    • x (Repetition, e.g. "-" x 70 gives 70 dashes)
  • String comparison operators
    • eq (equal)
    • ne (Not equal)
    • le (Less than or equal)
    • lt (Less than)
    • ge (Greater than or equal)
    • gt (Greater than)
  • Note that the string vs. numerical comparison operators are the opposite of what awk uses
    • The way Perl does it makes more sense: String comparisons use "string" operators
  • Binary assignment operators
    • All of the ones like in C ( +=, -=, *=, /=, etc.) are there
    • .= adds to a string
  • Autoincrement and autodecrement
    • Work like in C

Chop and chomp

  • chop removes the last character of a string, and returns the chopped character
  • Example:
 $string = "testing 1 2 3";
 $character = chop $string; results in $string being "testing 1 2 " and $character being "3"
  • chomp removes the last character, if and only if it's a newline
  • Removing a trailing newline is a common need, hence the existence of the chomp function.

Print, printf, and sprintf

  • In short, the print function prints its argument(s)
  • Examples:
        print("The answer is $answer\n");
        print "The question was $question\n";
  • Note: in general, the parentheses are optional when using Perl's builtin functions.
  • printf works like in C
  • Sprintf is like printf
    • Returns the formatted string, rather than printing it
    • Useful for assigning formatted strings to variables


  • A list is an ordered collection of scalar data
  • Represented in a program by several values, separated by commas and enclosed by parentheses
  • Strings and numbers can be mixed in a list
  • Example: (1, 2, "three", "four", 5)
  • The empty list is represented by a pair of empty parentheses ()
  • The list constructor operator can represent a sequence of numbers (1 .. 5) is the same as (1, 2, 3, 4, 5)
    • If the right value is less than the left value, the resulting list is the empty list
    • If the values are not whole numbers, the intervening values are still one greater than the starting value
    • If the "final" number is not an integer greater than the first value, the last "good" value is the value of the list
    • The list (1.2 .. 5.1) is really (1.2, 2.2, 3.2, 4.2)
  • If the list consists solely of strings, the "qw" (quote words) fuction can be used to simplify the representation
  • Example: ("eenie", "meenie", "minie", "moe") can be qw(eenie, meenie, minie, moe)


  • An array is a variable that holds a list
  • Named like a scalar variable, but using @ instead of $
  • @something and $something are completely different variables
  • An array can be assigned a list, or another array
  • Example: @array = (1, 2, 3); @array2 = @array;
  • A list can contain an array. The array members are simply inserted into the list
    • Example: @array = (3, 4, 5); @array2 = (1, 2, @array, 6); results in @array2 being the list (1, 2, 3, 4, 5, 6)
    • If an array is used in a scalar context, the scalar value is the number of elements in the list
    • Example: @array = (3, 4, 5); $array = @array; results in $array being 3
  • A list of variables can appear on the left-hand side of an assignment
    • Example: ($one,$two,$three) = (1, 2, 3);
  • Array elements
    • Array elements are accessed by a subscript in square brackets []
    • The index of the first element is 0
      • Example: If an array is @array = qw(one, two, three) then $array[0] is "one", $array[1] is "two" and $array[2] is "three"
      • This is the same as C, different than awk
    • Note that the array element starts with a $, since it's a scalar value
    • An array slice is more than one value from the same array, and since it's a list, the @ is used (Example: @array[0,1] is the first two elements of @array)
    • The index in an array can be a variable (useful in loops, for example)
    • If you access an element outside the bounds of the array, you get the undef value
      • Much nicer than C, which gives you a core dump if you do that
    • Assigning to an element outside the bounds of the array automatically extends the array (and assigns any intervening values to undef)
      • Again, much nicer than the core dump C would give you
    • The last index in an array is represented by $#arrayname
      • Example: @array = (1,2,3), $#array is 2
      • You can assign to $#arrayname to grow or shrink an array, but usually don't need to, since the array grows and shrinks automatically, as needed
    • A negative subscript counts from the end, so $array[-2] is the second to last element of @array
  • Push and pop
    • Arrays can be treated like stacks with the push and pop functions
    • push appends a scalar (or a list) to the end of an array
    • pop takes an element off the end of an array
    • Example
        @array = (1);
        push (@array, 2);
        push (@array, 3, 4, 5);
        $var = pop (@array);
    • push also happens to be a handy way to add values to an array, even if you aren't using the array as a stack
  • Shift and unshift
    • Like pop and push, but at the beginning rather than the end of the list
  • Reverse
    • Returns a list that contains the elements of its argument, in reverse order
    • Example
        @array = (1, 2, 3, 4);
        @revarray = reverse (@array);
  • Sort
    • Returns a list containing the elements of its argument, in sorted order
    • Default order is ASCII order, but you can specify your own order (we won't worry about this right now)
  • Chomp on an array
    • When used on an array, chomp chomps each element of the array

Control Structures

  • Statement blocks
    • A collection of statements, grouped by curly braces ({})
    • Can be used anywhere a single statement would be used
    • Semicolon on last statement is optional
  • If
    • Syntax: if ( expression ) block
    • If-else form: if ( expression ) block else block
      • Note that curly braces are always required on block, unlike C and Java which make it optional if block is only one line
    • If the expression is true, evaluate block
    • If the expression is false, and if there is an else statement, evaluate the else block
    • What is truth?
      • In essence, a value is true if, when evaluated as a string, it is neither the empty string nor "0"
      • The number 0 is false
      • The string "0" is false
      • The empty string ("") is false
      • The value undef is false (because it becomes the empty string)
      • Everything else is true
  • Unless
    • Sometimes, you really only want to do something if the test is false
    • You can negate the test
    • You can use unless in place of if
    • Unless can also have an else clause
  • Elsif
    • If you have multiple choices, you can use elsif
    • Syntax: if (statement) block elsif (statement) block elsif (statement) block else block
    • Note that it has an "s" in Perl, unlike in shell
  • while/until
    • Process a loop as long as a condition is true (or until a condition is true)
    • Syntax: while (statement) block, until (statement) block
    • Block is not evaluated if the condition is false/true the first time through
  • do {} while/until
    • Like while/until except the test is at the end of the loop, rather than the beginning
    • The block will always be executed at least once
  • for
    • Like the for statement in C and Java
    • Example:
        for ($i = 1; $i <= 61; $i++ ) {
                print "McGwire has $i home runs, no record yet\n";
  • foreach
    • Like the shell's "for" statement
    • Iterates over the values of a list
    • Assigns to the named variable on each iteration
    • Example:
        @players = qw (McGwire, Sosa, Bonds);
        foreach $player (@players) {
                print "$player is a good home run hitter.\n";
    • One note: Unless the list came from a function that returns a list, changing the value of the variable changes the value in the list/array
    • Could be both useful and dangerous.
    • Example:
        @a = (2, 3, 4, 5);
        foreach $num (@a) {
                $num **= 2;
  • The $_ variable
    • A special variable that you'll see in many places in Perl.
    • Some things assign to $_ if you don't specify anything else
    • Many functions operate on the $_ variable if you don't specify anything else
    • Foreach uses $_ if you don't specify the variable
    • Example:
        foreach (0, 1, 2, 3, 4, 5, 6, 7, 8, 9) {

Associative Arrays (Hashes)

  • Hashes
    • Also called "associative arrays," though hash has become the more popular term (no doubt because it's shorter).
    • Unlike arrays, hashes have no particular order.
    • Represented as variables with a %, as in %some_hash.
    • The values of a hash are key/value pairs, referenced with curly braces
    • The key is automatically quoted
    • Example:
        $salary{Joe} = 40000;
        $salary{Sherry} = 60000;
        $salary{Sam} = 20000;
    • A hash has no literal representation. It is represented as an array, with the first value as a key, then a value, then a key, then a value, etc.
    • Example:
        %salary = ("Joe", 40000, "Sherry", 60000, "Sam", 20000);
        if ( $salary{Sam} <= $salary{Joe} ) {print "Sam is underpaid!\n";}
    • The token "=>" is just a synonym for ",", but makes hash declarations look much better
    • Example:
        %salary = (
                Joe    => 40000,
                Sherry => 60000,
                Sam    => 20000,
    • Note that "=>" causes the item to its left to be quoted, so you don't need the quotes
  • keys function
    • Returns the keys of a hash, as a list
    • Returns an empty list if the hash is empty
    • Useful for iterating through the hash
    • Example:
        foreach $employee (keys (%salary)) {
                print "$employee makes $salary{$employee}.\n";
  • values function
    • Like keys, but returns the values of the hash
  • each function
    • Returns a two-element list, containing a key/value pair from a hash
    • On each successive call, returns another key/value pair
    • Returns an empty list (hence false) when there are no more key/value pairs
    • Example:
        while ( ($employee, $salary) = each (%salary) ) {
                print "$employee makes $salary.\n";
  • delete function
    • Removes a value from a hash
    • Example: delete $salary{Joe}; #Joe quit
  • Hash slices
    • A shorthand way to specify part of a hash
    • Example:
        @salary{Joe,Sherry,Sam} = (40000, 60000, 20000);

Basic I/O

  • A very simple example of processing I/O:
  while (<>) {
    • The empty angle brackets are called the diamond operator.
      • It will process STDIN if there were no command line arguments.
      • If there were command line arguments, it will work on each specified file, in succession.
      • The command line arguments are in array @ARGV, and can be processed and/or added to before using the diamond operator.

Regular Expressions

  • Perl has support for regular expressions, with some extensions to what we've seen before
  • In Perl, regular expressions are usually enclosed in slashes
  • Default match is against $_
  • Example:
  while (<>){
    if (/foo/) {
      print "\$_ contains the string 'foo'.\n";
  • Another common usage is to replace one thing with another
    • Looks like sed
  • Example:
  while (<>) {
  • Greediness
    • Like other examples we've seen, Perl's regular expressions are greedy.
    • Examples:
  $_ = "fooooooooooooooood";
  $_ = "f xx o xxx o xxxx d";
  $_ = "f xx or xxx ol xxxx d";
    • The patterns can be made non-greedy by following them with a question mark
  $_ = "f xx o xxx o xxxx d";
  • Parentheses
    • Parentheses can be put around part of a regular expression, and what matched within the partentheses is remembered for later use
    • The parentheses don't change how the regular expression matches
    • The matches can be used later in the regular expression as \1 for the first, \2 for the second, etc.
    • Example: /f(.*)o(.*)o\2d\1/ matches "fxoyoydx" or "f--o__o__d--"
  • Alternation
    • Provide several alternatives to match, separated by the vertical bar (|)
    • Example: /ford|chevy/ matches either "ford" or "chevy"
  • Anchoring
    • \b matches at a word boundary (the space between a word character [a-zA-Z0-9_] and a non-word character
    • Example: /\bford/ matches "ford" but not "afford"
    • \B matches where there isn't a word boundary
    • Example: /\Bford\b/ matches "afford" but not "ford"
    • ^ (caret) and $ (dollar sign) work like we've seen before
      • $ could be confused with indicating a variable, but if it's at the end of a regular expression string, it will be interpreted as "end of line"
  • Precedence
    • Regular expressions have rules of precedence as well
    • Does /a|b*/ mean a|(b*) or (a|b)*?
    • From highest to lowest, the precedence is:
  ( ) (?: )
  ? + * {m,n} ?? +? *? {m,n}?
  abc ^ $ \A \Z (?= ) (?! )
    • So, by these rules, /a|b*/ means /a|(b*)/
    • You can use parentheses to enforce your desired interpretation
    • But, parentheses "count" in the memory for \1, \2, etc.
    • You can use (?: ) to mean "group this stuff, but don't count it as a pattern
  • The =~ operator
    • Match on a string other than $_
    • Example $truck =~ /ford/; checks the variable $truck for the string "ford"
  • Ignoring case
    • Suppose you want to match "ford", "Ford", "FORD", or any other combination of case
    • You could, based on what's been covered, use /[Ff][Oo][Rr][Dd]/
      • What if you want to match a longer string, like "These are the times that try men's souls"?
    • There is a convenient shorthand /Ford/i
  • Different delimiters
    • If you're trying to match strings with "/" in them, the necessary backslash escapes can make things ugly really fast
    • To check for the "#!" line in a perl script, you might use /^#!\/usr\/local\/bin\/perl/
    • Or, you can use a different delimiter, by starting it with an "m" and using a pair of punctuation characters, like m@^/usr/local/bin/perl@
    • You can also use matching delimiters, like m[^/usr/local/bin/perl]
  • Variable interpolation
    • Variables are expanded inside of a regular expression match
    • Example: $match = "[Rr]ead"; $action =~ /$match/;
  • Special variables
    • After a match, the parenthesized matches are set to $1, $2, $3, etc.
    • $` contains what was before the match
    • $& contains what was matched
    • $' contains what was after the match
    • Example:
  $trucks = "ford chevy dodge toyota";
  $trucks =~ /(\w+)\W+(\w+)/;
  $first_truck = $1;  # "ford"
  $second_truck = $2; # "chevy"
  $trucks =~ /ch.*?y/;
  $before = $`; # "ford "
  $match = $&;  # "chevy"
  $after = $';  # " dodge toyota"
  • Substitutions
    • Form is s/regex/string/ (replaces regex with string)
    • Delimiter can be any characters, like with m//
    • Add "g" to the end to match all occurrences (rather than just the first)
    • Use $1, $2, etc. to use matches in the string
    • Example:
  $_ = "I have to go now";
  • Split and join
    • Split breaks up a string, based on a regular expression. Returns a list.
    • Join joins item in a list, with a string separating each item. Returns a scalar.
    • Example:
  $passwd = "geoff:*:101:5:Geoff Allen:/users/geoff:/usr/local/bin/bash";
  @fields = split(/:/,$passwd);
  $new_line = join(":", @fields);


  • Defining a function
    • Functions (and subroutines -- there's no difference in Perl) are defined with the "sub" command.
    • Format is:
  sub funcname {
  • Calling a function
    • You call your functions by using the function name, followed by parentheses
    • For example, if you had defined a function called "hit_homerun", you'd call it with the statement hit_homerun();
  • Return values
    • A function returns the value of the last expression, or the value of the return function
    • A function's return value is used as its value in the expression in which it is called
    • Example:
  sub three {
  print 4 + three();
  • Arguments
    • Arguments are passed to functions in the @_ array
    • They can be accessed one-by-one ($_[0], $_[1], etc.)
    • They can also be assigned all at once ( ($arg1, $arg2) = @_; )
    • Basically, you can do anything you want with the values
    • Warning! If you use a variable as an argument, and you modify things in @_ directly, you will modify the variable
  • Private variables
    • Perl provides the capability to make variables local to a function
    • This is done with the "my" operator (which takes a list of variables)
    • Example: my ($some_var, @some_array, %some_hash);
    • Another, slightly less private, variable operator is "local"
    • The difference is that "my" variables are seen only by the function/block; "local" variables are seen by the function/block and all functions called within that function/block
  • use strict;
    • There is a commonly-used "pragma" (compiler directive) for Perl that is quite useful
    • If you place the statement use strict; in your script (usually first or very early in the script), Perl will be a lot pickier about things
    • All variables in the script must be given a scope with my
    • use strict; will help prevent a lot of problems. Use it.

Misc. Control Structures

  • Last
    • Sometimes you want to be done with a loop before the loop is scheduled to be done
    • In C, you can use the "break" statement
    • In Perl, it's called last
    • Breaks out of a for, foreach, while, or until loop, not other blocks
    • Program continues after the end of the loop block
  • Next
    • Sometimes, you don't want to quit the loop, you just want to quit this iteration
    • next is how you do this
    • Example:
  while (<>) {
    if (/^$/) {
    if (/foo/) {
      print "I found 'foo' on the line $_!"\n";
  • Redo
    • If redo appears in the loop block, it will cause that iteration of the loop to start over
  • Labeled Blocks
    • You can put a label at the start of a block, and use that to explicitly identify which loop you mean with the last, next, or redo statement
    • Example:
  OUTER: for ($i = 1; $i <=10; $i++) {
    INNER: for ($j = 1; $j <= 10; $j++) {
      if ($i * $j == 63) {
        print "$i times $j is 63!\n";
        last OUTER;
      if ($j >= $i) {
        next OUTER;
  • Expression Modifiers
    • A nice, short way to write simple conditionals
    • Examples:
  next if (/^$/);
  last if ( ($i * $j) == 63);
  $i = 0; $i++ while ($i <= 10);
  • && and ||
    • && (and) and || (or) can function as control structures as well
    • Because Perl stops if it knows the "answer" to an and or or statement
    • Example: 0 && $i++ -- $i will never get incremented, because Perl knows the and is false as soon as 0 is evaluated
    • The following are all equivalent:
  if (condition) { statement; }
  statement if (condition);
  condition && statement;
    • Likewise, the following are equivalent:
  unless (condition) { statement; }
  condition || statement;


  • Filehandles
    • A filehandle is Perl's way of specifying a file to read from or write to
    • STDIN, STDOUT, and STDERR are filehandles that are provided for you
    • Filehandles have their own namespace
    • Tradition says to use all UPPERCASE letters for the name of your filehandle
  • Open
    • The open call has several forms
  open(FILEHANDLE, "/tmp/somefile");
  open(FILEHANDLE, ">/tmp/somefile");
  open(FILEHANDLE, ">>/tmp/somefile");
  open(FILEHANDLE, "| somecommand");
  open(FILEHANDLE, "somecommand |"
  • Close
    • When you close a file, you flush any optout pending for a write
    • Files are automatically closed when the program exits, but it doesn't hurt to close them yourself
    • Syntax is simply close (FILEHANDLE);
  • Die
    • die will quit your program, with an error message
    • Useful (and often seen) with open statement
    • Example:
  open (PASSWD, "/etc/passwd") ||
    die "Couldn't open the passwd file!\n";
    • If your message ends in "\n", die prints the message
    • If your message doesn't end in "\n", die prints the line number, filename, and your message
    • The varialbe $! contains the error from the operating system
  • Warn
    • die's little brother
    • prints the message, but doesn't abort the program
  • Using Filehandles
    • So, you've got your file open, what do you do with it?
    • If reading, you can do something like:
 open (PASSWD, "/etc/passwd" ) ||
   die "Couldn't open passwd: $!";
 while (<PASSWD>) {
    • If writing or appending, you can just add the filehandle to the print statement:
  print SOMEFILE "This goes in the file!\n";
  • File tests
    • There are a whole bunch of file tests available in Perl, many of which are copied from the test command used in shell programming
  • Advanced sorting
    • We've already looked at the sort function, and learned that it sorts in ASCII order
    • Now you get "the rest of the story."
    • You can provide a subroutine defining how to compare two of the things being sorted
    • The two things being compared are given as $a and $b
    • The routine should return a negative value if $a comes first, 0 if they're equal, or a positive number if $b comes first
    • Example:
  sub by_record {
    return -1 if ( $record{$a} <  $record{$b} );
    return  0 if ( $record{$a} == $record{$b} );
    return  1 if ( $record{$a} >  $record{$b} );
  @al_west = sort by_record ("Seattle", "California", "Texas", "Oakland");
    • This type of comparison is common enough, that it has a special operator:
  sub by_record {
    $record{$a} <=> $record{$b};
    • The equivalent operator for strings is cmp
    • Finally, the comparison routine can be put right inline:
  sort { $a <=> $b } (3, 1, 7, 2.813, 17.5, 4, 4.22);
  • Transliteration
    • The tr operator replaces characters from the first string with characters from the second string
      • i.e., just like the tr command
    • By default, works on $_, can work on something else with =~
    • Example: