September 17

*  Applications

   http://www.ibc.wustl.edu/~zuker/Bio-5495/sequences-html/node5.html

   Use FSM to look for a pattern in a DNA sequence


*  L(M) = ((ab v aab)*a*)*
   M = ?

         a   b
        +-++--------+
        v |v        |
   --->+===+ a  +-+ |
    +->|| ||--->| |-+
    |  +===+    +-+
    |   a|
    |    v
    |   +-+
  b |   | |
    |   +-+
    |   a|
    |    v
    |   +-+
    |   | |
    |   +-+
    |    ^
    +----+

*  NFAs with epsilon-transitions

   Allow change of state without reading input

   delta:  K x (Sigma union {e}) -> P(S)

   Example:

         0       1
        +--+    +--+
        v  |    v  |
	+--+ e  +--+
	|q0|--->|q1|
	+--+    +--+
		 | e \ 1
		 |    \
		 v     v
		+--+ 0  +====+<+
		|q3|--->||q2|| | 0
		+--+    +====+-+


   Provides flexibility in constructing FSM

   (0 v 11)101*

       +-+ 0   +-+
   --->| |---->| |
       +-+     +-+
       1|       |
	v       |
       +-+     e|
       | |      |
       +-+      |        1
       1|       | +-------------+
	v       | v             |
       +-+ e   +===+ 1  +-+ 0  +-+
       | |---->|| ||--->| |--->| |
       +-+     +===+    +-+    +-+

* Equivalence of Techniques

  We have shown four methods of representing regular languages
  DFAs, NFAs, NFAs with epsilon-transitions, regular expressions

  We will now show that these are equivalent
  We will show methods of converting from one representation to another

		 +-------->DFAs----------+
                 |                       |
                 |                       |
                 v                       v
		NFAs                    REs
                 ^                       |
                 |                       |
                 |                       |
		 +------>NFAs with<------+
		     epsilon-transitions

* DFAs <-> NFAs

  DFAs -> NFAs

  Theorem:  if L is accepted by some DFA, then L is accepted by some NFA.
  Proof:  Obvious.  NFAs subsume DFAs.


  NFAs -> DFAs

  Theorem:  if L is accepted by some NFA, then L is accepted by some DFA.
  Proof:  Build a DFA that keeps track of all the states the NFA could be in at
  any time while reading the input.


  Note:  Finite automata M1 and M2 are said to be "equivalent"
  iff L(M1) = L(M2).

  Example

     M1 = ({q0,q1,q2}, {0,1}, delta, q0, {q2})
     d(q0,0) = q0, d(q0,0) = q1, d(q0,1) = q2,
     d(q1,0) = q2,
     d(q2,1) = q2, d(q2,1) = q1

     M2 = ({{}, q0, q1, q2, q0q1, q0q2, q1q2, q0q1q2}, {0,1}, delta, q0,
	   {q2, q0q2, q1q2, q0q1q2})

     Note that
	* DFA states correspond to all possible sets of states of NFA
	* Final states correspond to all states containing NFA final states
	* Transitions correspond to NFA transitions

     d(q0,0) = q0q1		d(q0,1) = q2
     d(q1,0) = q2		d(q1,1) = {}
     d(q2,0) = {}		d(q2,1) = q1q2
     d({},0) = {}		d({},1) = {}
     d(q0q1,0) = q0q1q2 	d(q0q1,1) = q2
     d(q0q2,0) = q0q1		d(q0q2,1) = q1q2
     d(q1q2,0) = q2		d(q1q2,1) = q1q2
     d(q0q1q2,0) = q0q1q2	d(q0q1q2,1) = q1q2


   Proof
      Let M = (K, Sigma, delta, s, F) be an NFA.
      Construct a DFA M' = (K', Sigma, delta', s', F') s.t. L(M) = L(M').

      K' = P(K) = power set of K
		  Note that we will remove states that cannot be reached
      s' = s
      F' = {k' in K': k' intersect F neq empty}
      delta'(q0q1...qi, a) = p0p1...pj  iff
	 delta({q0, q1, ..., qi}, a) = {p0, p1, ..., pj}
	 Note that delta' is computed by applying delta to each state of K
	  represented by q0q1...qi.  When we apply delta to each of these
	  states and take the union we get a new set of states, p0 ... pj.
	  The new set of states has a representative state p0...pj, which is the
	  value of delta'(q0q1...qi, a).

      Prove by induction on |w| that
	 delta'(s', w) = q0q1...qi iff delta(s, w) = {q0, q1, ..., qi}.

      Base Case:  Let |w| = 0.  In this case s' = s and w must be e,
		  and we know delta'(s',e) = s' and delta(s,e) = {s}.

      Inductive Hypothesis:  Assume property holds for |w| <= n.

      Inductive Step:  Prove property then holds for |w| = n+1.
	 Let a represent the last symbol in the string (string is xa), |x| = n.
	 delta'(s', xa) = delta'(delta'(s', x), a).

	 From the inductive hypothesis we know
	    delta'(s', x) = q0q1...qj iff delta(s, x) = {q0, q1, ..., qj}.
	 
	 From the definition of delta' we know
	    delta'(q0q1...qj, a) = [r0r1...rk] iff
	    delta({q0, q1, ..., qj}, a) = {r0, r1, ..., rk}.

	 Thus we know that
	    delta'(s', xa) = r0r1...rk iff delta(s, xa) = {r0, r1, ..., rk}.

	 Thus the property holds.

      We also know that delta'(s', w) is in F' exactly when delta(s, w) contains
      a state of K that is in F.

      Thus L(M) = L(M').
      Q.E.D.

* NFAs <-> NFAs with epsilon-transitions

  NFAs -> NFAs with epsilon-transitions

  Theorem:  if L is accepted by some NFA,
     then L is accepted by some NFA with epsilon-transitions.
  Proof:  Obvious.  NFAs with epsilon-transitions subsume NFAs.



  NFAs with epsilon-transitions -> NFAs

  Theorem: if L is accepted by some NFA with epsilon-transitions,
     then L is accepted by some NFA without epsilon-transitions.
  Proof:  Given M = (K, Sigma, delta, s, F) which is an NFA with
     epsilon-transitions,

     construct M' = (K', Sigma, delta', s', F') which is an
     NFA without epsilon-transitions, s.t. L(M) = L(M').

  K' = K
  s' = s
  delta' is an extension of delta.  We add extra edges to compensate for lack
     of epsilon-edges.

  delta' has all non-epsilon-transitions that delta has.
  In addition, forall p,q in K, forall a in Sigma, if
         e   a   e
      p ~~~>--->~~~>q in M, where ~~~> indicates a finite-length path of e edges
                        a
  then add transition p--->q in M'

                  e
  F' = F unless s~~~>S_{accept} in M, in which case F' = F union {s}.


  For the previous example we note:

     q0 on 0 can lead to q0, q1, q2, q3
	   1             q1, q2, q3
     q1 on 0 can lead to q2
	   1             q1, q2, q3
     q2 on 0 can lead to q2
	   1             empty
     q3 on 0 can lead to q2
	   1             empty

     This defines our NFA without epsilon-transitions, F = {q2} because there
     was not a path from s to a final state consisting of epsilon-edges only
     in the original NFA.

   Leave the proof as an exercise.


* Example of the conceptual power of nondeterminism

  Let L be a language.  Define 1/2(L) to be
     {x: for some y such that |x| = |y|, xy is in L}.

  1/2(L) is the set of first halves of strings in L.  Prove that for each
     regular L, 1/2(L) is regular.

  Proof:  Let M be a DFA for L.  M' will be an NFA constructed to accept 1/2(L).
  M' uses the "finger method" as follows.
  M' places the left finger on s in M, and the right finger on s_i in M
     where s_i is a nondeterministic "guess" as to where input x will lead.

  As aharacters of x are read, M' moves the left finger along transitions
     as dictated by characters read, and simultaneously moves the right finger
     along a nondeterministically chosen edge (which may be labeled with any
     character, not necessarily the one of x being read).

  M' accepts x iff at the end its left finger is on s_i and
     its right finger is on the accept state.
  Thus M' accepts x iff
     * x leads to some state s_i in M, and
     * there exists y |y| = |x| s.t. y leads from s_i to an accept state in M.

  Formally:

     Use tuple for fingers (we need a middle finger to remember
        the guessed s_i state).

     M' = (K', Sigma, delta', s', F'), where
        K' = K_{left finger} x K{middle finger} x K{right finger} and
        s in K = new start state

        * delta'(s, epsilon) = {: si in S}
          "Guess" the state si to which input x will lead

        * delta'(, a) =
          {: delta(si, a) = sl and
          there exists b in Sigma s.t. delta(sk, b) = sm}
          Note that the middle finger does not change after the initial guess

     F' = {:  si in S, sj in F}

     As you can see, the left finger moved on input x to where the
     middle finger guessed (si, si).  The right finger moved from si
     to some accepting state sj of M (si, sj).