Sample input files for Ron Shamir's CS workshop 0368-3500-07 (Fall 2007-8)

Input #1: input1.fasta
The first input file is small - 100 sequences of length 200 (the real input files will be much larger, of course).
It contains a pair of simple (string) motifs of length 6 with a significant order bias:
motif A: ACCTTT , motif B: GGGAAG
There should be 15 occurrences of the pair A->B (i.e., A upstream of B) with a gap of length 10-20 between them;
there are 5 additional occurrences on the reverse-complement strand.

Input #2: input2.fasta
The second sample is larger - 2,000 sequences of length 1,000.
It contains several pairs of motifs of length 8 with an order bias:

A pair of string motifs: motif A: TAAAAAAT , motif B: CCCCGGGG
This pair appears on both strands with a gap of length 20-50.
There are 54 occurrences of the pair A->B with the above gap lengths, and 13 occurrences of the reverse order B->A.
A pair of consensus motifs: motif A: TA[ACT]AA[AG]AT , motif B: GGAA[AT]TTT
This pair appears only on the "+" (=original) strand with a gap of length 10-30.
There are 160 occurrences of the pair A->B.
This pair is also localized (i.e., its hits aren't distributed uniformly along the sequences).
Another pair of consensus motifs: motif A: GAGA[CG][AT]CC , motif B: CTATACC[CG]
This pair appears on both strands with a gap of length 40-45 (the sequence of the gap is quite conserved).
There are roughly 370 occurrences of the pair A->B.
A third motif (AACGTTCC) appears 60-80 bases upstream of this pair (only on the "+" strand).