October 20 * Properties of Context-Free Languages We want to start proving whether a given language is a CFL. We will start with the pumping lemmga for CFL. In order to prove the pumping lemma, we need to be familiar with converting grammars to "Chomsky Normal Form". * Pumping lemma For all sufficiently long z in CFL L, there exist two substrings, not too far apart, that can be simultaneously pumped to obtain more words in L. Let k = #vars in grammar G in CNF (consider L - e). If z in L(G) and parse tree for z has no path of length >=k+1 (all paths have length <= k) Then how long can z be? Answer: |z|<=2^{k-1} Proof. The number of leaves in a complete binary tree with height (greatest path length) i is 2^i. However, because the last edge on the path here corresponds to the production A->a, the parent node is not a branch point. Thus these last edges do not incrase the length of the sentential form. So |z| <= #leaves in complete binary tree with path length (height) k-1. Thus |z|<=2^{k-1}. If |z|>=2^k (=n), Then the parse tree for z must have path of length >=k+1. Let G have k vars. If |z|>=n = 2^k, there exists path in tree of length >=k+1 with >=k+2 vertices, the last of with is a terminal. As a result, on the path there exists k+1 vars, at least two of which must be the same since k = #distinct vars. This is the important part!!! The parse tree for z is S / \ / /\ / v1=A \ / / \ \ / / /\ \ / / v2=A \ \ / / / \ \ \ |u |v | w | x| y | where distance of vertex v1 to v2 is at most k and distance from v1 to a leaf is at most k+1. Let T1=subtree rooted at v1 T2=subtree rooted at v2 T1 yields vwx T2 yields w z = uvwxy Furthermore, |vwx|<=2^k and not both v anc x can be e, since first rule of T1 is A->BC and w is beneath only one of B,C. Now observe A =>* vAx A =>* w Thus forall i>=0, A=>* v^i w x^i Inspecting the tree, S=>*z implies S =>* uAy =>* u v^i w x*i y P.L. Let L be a CFL. There exists n s.t. forall z in L with |z|>=n, there exist u, v, w, x, y s.t. * z = uvwxy * |vwx| <= n * |vx| >= 1 * forall i>=0, u v^i w x^i y in L * The P.L. is used to show languages are NOT context free Example L = {a^j b^j c^j: j>=0} is not context free. Proof. Assume L is context free. Then L satisfies P.L. Then there exists n by the P.L. s.t. ... Let z = a^n b^n c^n Because |z|>=n and z in L, by PL there exist uvwxy s.t. ... But if there exist u,v,w,x,y satisfying first three constraints, we can show there exists i s.t. the fourth constraint fails. Case uvw consists only of "a". Then when i=0 string is not in L (fewer "a"s than "b"s or "c"s). Case vwx contains only "b"s - similar. Case vwx contains only "c"s - similar. Case vwx contains only two types of characters from {a,b,c}. Then uwy has more of the remaining type of characters. The string vw cannot contain "a"s, "b"s, and "c"s, since vwx is a substring of z and has length <=n (cannot contain the last "a" and the first "c" both). * Closure Properties of Context-Free Languages - CFLs are closed under union If L1, L2 are CFL then given G1, G2 CFG for L1 and L2 we can construct a CFG for L1 union L2 S -> S1 | S2 + productions of G1 + productions of G2 Here S1 and S2 are start symbols of G1 and G2 (rename if necessary). - CFLs are NOT CLOSED under intersection L1 = {a^i b^i c^j: i,j >= 0} is a CFL L2 = {a^i b^j c^j: i,j >= 0} is a CFL L1: S -> AB A -> aAb | e B -> cB | e L2: S -> AB A -> aA | e B -> bBc | e L1 intersect L2 = {a^i b^i c^i: i >= 0} is not a CFL (we showed this in class earlier). - CFLs are closed under concatenation Let L1, L2 be CFL, G1, G2 are the CFG for L1 and L2. For CFG of concatenation S -> S1 S2 + productions of G1 + productions of G2 Here S1 and S2 are start symbols of G1 and G2 (rename if necessary). - CFLs are closed under Kleene Closure Let L1 be a CFL, G1 is the CFG for L1. For CFG of Kleene Closure S' -> S S' | e Here S is the start symbol of G1. For example, S -> aAb A -> c | d Kleene Closure includes these and S -> S S' | e Now S' -> S S' -> acbS' -> acbSS' -> acbacbS' -> ... - CFLs are closed under intersection with a regular language Let L be a CFL and R be a regular language. L = L(M) for PDA M and R is L(A) for DFA A. We can construct PDA M' that keeps track of both sets of moves (runs the two machines in parallel). When M changes state on e, do not change state of A. Change ((q1, e, beta), (p1, gamma)) to (((q1, q2), e, beta), (p1, q2), gamma)) for each q2 in A. When M changes state on a, M' simulates that move and simulates A's change of state on a. Change ((q1, a, beta), (p1, gamma)) to (((q, q2), a, beta), ((p1, delta(q2, a)), gamma)) for each q2 in A. M' accepts if and only if both A and M accept. - Example L = {ww: w in {a,b}*} If L is CFL, then so is L' = L intersect a+b+a+b+. (L is intersected with a regular language). However, L' = {a^i b^j a^i b^j: i,j >= 0} which is not a CFL!!! Thus, L is not a CFL. - Example CFLs are not closed under complement. L = {x: x not of form ww} is a CFL (homework problem) _ The complement = L = {ww: w in Sigma*} is not a CFL!!! Thus CFLs are not closed under complement.