CD-HIT is a very widely used program for clustering and comparing protein or nucleotide sequences.