[CD-HIT] clustering nt database

Dan Bolser dan.bolser at gmail.com
Thu Sep 10 22:52:07 EDT 2009


Hi Ryan,

I think you should be using cd-hit, not cd-hit-est (if I guess from
the name correctly that cd-hit-est is designed for clustering ESTs).

The error would make sense in this case, as protein sequences can be
very very long (i.e. titin), but ESTs are typically very short.

Sorry that I am not up to speed with the latest releases of cd-hit,
but why are you not running the 'cd-hit' binary?


Dan.

2009/9/10 Ryan Golhar <golharam at umdnj.edu>:
> Hi,
>
> How do I go about clustering the nt database?
>
> When I run
>
> cd-hit-est -i /usr/local/ncbi/db/nt -o /tmp/nt90 -c 0.90 -n 8
>
> I get the error:
>
> Fatal Error
> Too long sequence found, enlarge Macro MAX_SEQ
>
> Program halted !!
>
> What do I enlarge MAX_SEQ to?
>
> _______________________________________________
> CD-HIT-l mailing list
> CD-HIT-l at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/cd-hit-l
>



More information about the CD-HIT-l mailing list