Although this problem may seem simple, it is known to be NP-Complete. However, there are greedy algorithms which perform fairly well in practice. This problem is actually more complicated due to the existence of repetitive sequences in the genome.