October 1

* Context-Free Grammars

     a,b,c,d,... are terminal symbols
     A,B,C,D,... are nonterminal symbols
     u,v,w, ... are strings consisting of terminal symbols
     alpha, beta, gamma, ... are strings in the alphabet
     X,Y,Z, ... is a symbol in the alphabet


      G = ({S,0,1}, {0,1}, R, S)
	 R = {S->aSb, S->e}

      For any strings u,v in V*, we write u =>G v iff there are strings
	 x,y in V* and A in NT s.t. u = xAy, v=xv'y, and A ->G v'.

      The relation =>_G* is the reflexive, transitive closure of =>G.

      L(G) is the "language generated by G", or
	 {w in Sigma*:  S =>_G* w}
      G generates each string in L(G).
      A language L is a "context-free language" if L = L(G) for some
	 context-free grammar G.

      If the choice of grammar is obvious, we will use -> instead of ->G and =>
	 instead of =>G.

      S -> aSb -> aaSbb -> aabb

      A sequence of this form is a "derivation" in G of aabb from S.
      w0 =>G w =>G ... =>G wn is a derivation of wn in G from w0.

      The natural number n here is the "length" of the derivation, and the
      derivation has n "steps".

      What is L(G)?

      L(G) = {0^n 1^n : n >= 0}.

      In fact, all regular languages are context free and some languages
      that are not regular are context free.

   Prove that S -> 0S0 | 1S1 | e | 0 | 1 generates palindromes.

   1.  PAL <= L(G)

      If w is a palindrom, then show by induction in |w| that S =>* w.

      If |w| = 0, w = e and we are done because S -> e.

      If |w| = n+1, then w = 1u1 or 0u0 where u is a palindrome of length n-1.

      By (unstated) inductive hypothesis, S =>* u, so
	 S -> 1S1 => 1u1 = w.  Use similar argument for 0u0.

   2.  L(G) <= PAL

      If S=>* w then w is a palindrome.

      Prove by induction on |w|.  Clear for |w| = 0,1.

      If |w|>1, then in derivation of w, 1st production must be
	 S -> 0S0 or S -> 1S1

	 where the inner S generates middle subword of w of length
	 |w| - 2.  Because of the inductive hypothesis, this word is a
	 palindrome, and hence so is w.

   Tricks for Grammars.

      Union.  Use S->A, S->B for A v B
      Repeated Characters.  Use A-> aA | a.
      Repeated sets of characters in equal numbers.  Use A -> bAc | e.

   Problem 3.1.3 part a.

      Construct a context-free grammar for L(G) = {wcw^R:  w in {a,b}*}

      G = ({S,a,b}, {a,b,c}, R, S)
	 R = S-> c | aSa | bSb

* Parse trees

  Consider grammar S->E, E->E+E | E*E | a

  Derivation 1
     S -> E -> E+E -> E*E+E -> a*E+E -> a*a+E -> a*a+a
     Show the parse tree
     This corresponds to (a*a)+a

  Derivation 2
     S -> E -> E+a -> E*E+a -> a*E+a -> a*a+a
     Show the parse tree
     This corresponds to (a*a)+a

  Derivation 3
     S -> E -> E*E -> a*E -> a*E+E -> a*a+E -> a*a+a
     Show the parse tree
     This corresponds to a*(a+a)

  Root of a parse tree is the start symbol
  Each branch corresponds to the application of a rule
  Internal nodes are nonterminals
  Leaves are terminals

  Two specific types of derivations:
    1.  Leftmost derivation
        Replace the leftmost nonterminal with each rule application
    2.  Rightmost derivation
        Replace the rightmost nonterminal with each rule application

  Theorem 3.2.1
     Let G = (V, Sigma, R, S) be a CFG, and let A be a NT and w in Sigma*.
     Then the following statements are equivalent:

     (a) A =>* w.
     (b) There is a parse tree with root A and yield w.
     (c) There is a leftmost derivation A =>^{L*} w.
     (d) There is a rightmost derivation A =>^{R*} w.

  Grammars with two or more distinct parse trees are "ambiguous".
  Ambiguous grammars are difficult because more than one parsing of a
  sentence is possible.
  Programming languages are not "inherently ambiguous"
  (not all CFG that generate the language must be ambiguous).

* Exam Review