[CD-HIT] clustering nt database

Ryan Golhar golharam at umdnj.edu
Fri Sep 11 14:10:07 EDT 2009


I'm using cd-hit-est because the documentation says thats the only one 
that works on DNA sequences.  The documentation talks about protein 
sequences for the rest of the programs.  Is this not the case?

Dan Bolser wrote:
> Hi Ryan,
> 
> I think you should be using cd-hit, not cd-hit-est (if I guess from
> the name correctly that cd-hit-est is designed for clustering ESTs).
> 
> The error would make sense in this case, as protein sequences can be
> very very long (i.e. titin), but ESTs are typically very short.
> 
> Sorry that I am not up to speed with the latest releases of cd-hit,
> but why are you not running the 'cd-hit' binary?
> 
> 
> Dan.
> 
> 2009/9/10 Ryan Golhar <golharam at umdnj.edu>:
>> Hi,
>>
>> How do I go about clustering the nt database?
>>
>> When I run
>>
>> cd-hit-est -i /usr/local/ncbi/db/nt -o /tmp/nt90 -c 0.90 -n 8
>>
>> I get the error:
>>
>> Fatal Error
>> Too long sequence found, enlarge Macro MAX_SEQ
>>
>> Program halted !!
>>
>> What do I enlarge MAX_SEQ to?
>>
>> _______________________________________________
>> CD-HIT-l mailing list
>> CD-HIT-l at bioinformatics.org
>> http://www.bioinformatics.org/mailman/listinfo/cd-hit-l
>>
> 



More information about the CD-HIT-l mailing list