```October 20

* Properties of Context-Free Languages

We want to start proving whether a given language is a CFL.
We will start with the pumping lemmga for CFL.  In order to prove
the pumping lemma, we need to be familiar with converting grammars to
"Chomsky Normal Form".

* Pumping lemma

For all sufficiently long z in CFL L, there exist two substrings, not too far
apart, that can be simultaneously pumped to obtain more words in L.

Let k = #vars in grammar G in CNF (consider L - e).
If z in L(G) and parse tree for z has no path of length >=k+1
(all paths have length <= k)
Then how long can z be?

Proof.  The number of leaves in a complete binary tree with height (greatest
path length) i is 2^i.  However, because the last edge on the path here
corresponds to the production A->a, the parent node is not a branch point.
Thus these last edges do not incrase the length of the sentential form.
So |z| <= #leaves in complete binary tree with path length (height) k-1.
Thus |z|<=2^{k-1}.

If |z|>=2^k (=n),
Then the parse tree for z must have path of length >=k+1.

Let G have k vars.
If |z|>=n = 2^k, there exists path in tree of length >=k+1 with
>=k+2 vertices, the last of with is a terminal.
As a result, on the path there exists k+1 vars, at least two of which must
be the same since k = #distinct vars.
This is the important part!!!

The parse tree for z is

S
/   \
/    /\
/ v1=A  \
/ /    \  \
/ /     /\  \
/ /  v2=A  \  \
/ /  /    \  \  \
|u |v |  w | x| y |

where distance of vertex v1 to v2 is at most k
and distance from v1 to a leaf is at most k+1.

Let T1=subtree rooted at v1
T2=subtree rooted at v2

T1 yields vwx
T2 yields w
z = uvwxy

Furthermore, |vwx|<=2^k and not both v anc x can be e, since first rule of
T1 is A->BC and w is beneath only one of B,C.

Now observe
A =>* vAx
A =>* w

Thus forall i>=0, A=>* v^i w x^i

Inspecting the tree, S=>*z implies
S =>* uAy
=>* u v^i w x*i y

P.L.  Let L be a CFL.  There exists n s.t. forall z in L with |z|>=n,
there exist u, v, w, x, y s.t.

* z = uvwxy
* |vwx| <= n
* |vx| >= 1
* forall i>=0, u v^i w x^i y in L

* The P.L. is used to show languages are NOT context free

Example

L = {a^j b^j c^j:  j>=0} is not context free.

Proof.  Assume L is context free.  Then L satisfies P.L.
Then there exists n by the P.L. s.t. ...
Let z = a^n b^n c^n
Because |z|>=n and z in L, by PL there exist uvwxy s.t. ...

But if there exist u,v,w,x,y satisfying first three constraints, we can show
there exists i s.t. the fourth constraint fails.

Case uvw consists only of "a".  Then when i=0 string is not in L (fewer "a"s
than "b"s or "c"s).

Case vwx contains only "b"s - similar.

Case vwx contains only "c"s - similar.

Case vwx contains only two types of characters from {a,b,c}.
Then uwy has more of the remaining type of characters.

The string vw cannot contain "a"s, "b"s, and "c"s, since vwx is a substring
of z and has length <=n (cannot contain the last "a" and the first "c" both).

* Closure Properties of Context-Free Languages

- CFLs are closed under union

If L1, L2 are CFL then given G1, G2 CFG for L1 and L2 we can construct
a CFG for L1 union L2

S -> S1 | S2

+ productions of G1
+ productions of G2

Here S1 and S2 are start symbols of G1 and G2 (rename if necessary).

- CFLs are NOT CLOSED under intersection

L1 = {a^i b^i c^j: i,j >= 0} is a CFL
L2 = {a^i b^j c^j: i,j >= 0} is a CFL

L1:  S -> AB
A -> aAb | e
B -> cB | e

L2:  S -> AB
A -> aA | e
B -> bBc | e

L1 intersect L2 = {a^i b^i c^i: i >= 0} is not a CFL (we showed this
in class earlier).

- CFLs are closed under concatenation

Let L1, L2 be CFL, G1, G2 are the CFG for L1 and L2.

For CFG of concatenation

S -> S1 S2

+ productions of G1
+ productions of G2

Here S1 and S2 are start symbols of G1 and G2 (rename if necessary).

- CFLs are closed under Kleene Closure

Let L1 be a CFL, G1 is the CFG for L1.

For CFG of Kleene Closure

S' -> S S' | e

Here S is the start symbol of G1.

For example,
S -> aAb
A -> c | d

Kleene Closure includes these and
S -> S S' | e

Now S' -> S S' -> acbS' -> acbSS' -> acbacbS' -> ...

- CFLs are closed under intersection with a regular language

Let L be a CFL and R be a regular language.

L = L(M) for PDA M and R is L(A) for DFA A.
We can construct PDA M' that keeps track of both sets of moves
(runs the two machines in parallel).

When M changes state on e, do not change state of A.
Change ((q1, e, beta), (p1, gamma)) to
(((q1, q2), e, beta), (p1, q2), gamma)) for each q2 in A.

When M changes state on a, M' simulates that move and simulates
A's change of state on a.
Change ((q1, a, beta), (p1, gamma)) to
(((q, q2), a, beta), ((p1, delta(q2, a)), gamma)) for each q2 in A.

M' accepts if and only if both A and M accept.

- Example

L = {ww: w in {a,b}*}

If L is CFL, then so is  L' = L intersect a+b+a+b+.
(L is intersected with a regular language).

However, L' = {a^i b^j a^i b^j: i,j >= 0} which is not a CFL!!!
Thus, L is not a CFL.

- Example

CFLs are not closed under complement.

L = {x: x not of form ww} is a CFL
(homework problem)
_
The complement = L = {ww: w in Sigma*} is not a CFL!!!

Thus CFLs are not closed under complement.
```