October 1

* Context-Free Grammars

  Conventions
     a,b,c,d,... are terminal symbols
     A,B,C,D,... are nonterminal symbols
     u,v,w, ... are strings consisting of terminal symbols
     alpha, beta, gamma, ... are strings in the alphabet
     X,Y,Z, ... is a symbol in the alphabet

   Example

      G = ({S,0,1}, {0,1}, R, S)
	 R = {S->aSb, S->e}

      For any strings u,v in V*, we write u =>G v iff there are strings
	 x,y in V* and A in NT s.t. u = xAy, v=xv'y, and A ->G v'.

      The relation =>_G* is the reflexive, transitive closure of =>G.

      L(G) is the "language generated by G", or
	 {w in Sigma*:  S =>_G* w}
      G generates each string in L(G).
      A language L is a "context-free language" if L = L(G) for some
	 context-free grammar G.

      If the choice of grammar is obvious, we will use -> instead of ->G and =>
	 instead of =>G.

      S -> aSb -> aaSbb -> aabb

      A sequence of this form is a "derivation" in G of aabb from S.
      w0 =>G w =>G ... =>G wn is a derivation of wn in G from w0.

      The natural number n here is the "length" of the derivation, and the
      derivation has n "steps".

      What is L(G)?

      L(G) = {0^n 1^n : n >= 0}.

      In fact, all regular languages are context free and some languages
      that are not regular are context free.


   Prove that S -> 0S0 | 1S1 | e | 0 | 1 generates palindromes.

   1.  PAL <= L(G)

      If w is a palindrom, then show by induction in |w| that S =>* w.

      If |w| = 0, w = e and we are done because S -> e.

      If |w| = n+1, then w = 1u1 or 0u0 where u is a palindrome of length n-1.

      By (unstated) inductive hypothesis, S =>* u, so
	 S -> 1S1 => 1u1 = w.  Use similar argument for 0u0.

   2.  L(G) <= PAL

      If S=>* w then w is a palindrome.

      Prove by induction on |w|.  Clear for |w| = 0,1.

      If |w|>1, then in derivation of w, 1st production must be
	 S -> 0S0 or S -> 1S1

	 where the inner S generates middle subword of w of length
	 |w| - 2.  Because of the inductive hypothesis, this word is a
	 palindrome, and hence so is w.


   Tricks for Grammars.

      Union.  Use S->A, S->B for A v B
      Repeated Characters.  Use A-> aA | a.
      Repeated sets of characters in equal numbers.  Use A -> bAc | e.


   Problem 3.1.3 part a.

      Construct a context-free grammar for L(G) = {wcw^R:  w in {a,b}*}

      G = ({S,a,b}, {a,b,c}, R, S)
	 R = S-> c | aSa | bSb

* Parse trees

  Consider grammar S->E, E->E+E | E*E | a

  Derivation 1
     S -> E -> E+E -> E*E+E -> a*E+E -> a*a+E -> a*a+a
     Show the parse tree
     This corresponds to (a*a)+a

  Derivation 2
     S -> E -> E+a -> E*E+a -> a*E+a -> a*a+a
     Show the parse tree
     This corresponds to (a*a)+a

  Derivation 3
     S -> E -> E*E -> a*E -> a*E+E -> a*a+E -> a*a+a
     Show the parse tree
     This corresponds to a*(a+a)

  Root of a parse tree is the start symbol
  Each branch corresponds to the application of a rule
  Internal nodes are nonterminals
  Leaves are terminals

  Two specific types of derivations:
    1.  Leftmost derivation
        Replace the leftmost nonterminal with each rule application
    2.  Rightmost derivation
        Replace the rightmost nonterminal with each rule application

  Theorem 3.2.1
     Let G = (V, Sigma, R, S) be a CFG, and let A be a NT and w in Sigma*.
     Then the following statements are equivalent:

     (a) A =>* w.
     (b) There is a parse tree with root A and yield w.
     (c) There is a leftmost derivation A =>^{L*} w.
     (d) There is a rightmost derivation A =>^{R*} w.


  Grammars with two or more distinct parse trees are "ambiguous".
  Ambiguous grammars are difficult because more than one parsing of a
  sentence is possible.
  Programming languages are not "inherently ambiguous"
  (not all CFG that generate the language must be ambiguous).

* Exam Review