[CD-HIT] CD-HIT - Unusual Clustering

Limin Fu phoolimin at gmail.com
Fri Nov 12 19:30:33 EST 2010


Hi Rob,

We have just made a developmental release on: http://cdhit.google.com. This
release has included a few improvements on filtering process, the alignment
band searching and the local band alignment computation. It should be more
sensitive than the previous releases. Perhaps you can try this release.

Best regards,

Limin



On Thu, Nov 11, 2010 at 2:17 AM, Dan Bolser <dmb at bioinformatics.org> wrote:

> Hi Rob,
>
> No I would not say that this result is strictly 'expected', but if you
> look at the name of CD-HIT, the T signifies Tolerance, i.e. the
> heuristic with CD-HIT uses can rarely split sequences with identity
> greater than the threshold into separate clusters (that is the
> trade-off you get for the incredible speed at which the heuristic can
> generate clusters without doing pairwise alignments of all sequences
> in the database).
>
> http://www.ncbi.nlm.nih.gov/pubmed/11836214
>
>
> I've included the cd-hit mailing list address in the reply to this
> email. There may be people on that list who can give a much better
> explaination than I can, and who can look into the example you
> provided in more detail (I just help run the project page on
> Bioinformatics.Org).
>
>
> Thanks for providing feedback on CD-HIT!
>
> All the best,
> Dan.
>
>
> On 10 November 2010 07:38, Rob Syme <rob.syme at gmail.com> wrote:
> > Hi,
> >
> > I'm looking to cluster proteins from three fungal genomes. I've come
> across
> > a curious result where two very similar sequences (attached) are not
> > clustered:
> >
> > cd-hit -i Curious.fasta -o Curious.clusters
> > cd-hit -i Curious.fasta -o Curious.clusters -c 0.7
> >
> > Both commands split the two sequences into two clusters, even though the
> > alignment covers 93% of the longest protein with perfect identity.
> >
> > Is this the expected behaviour for CD-HIT?
> > -r
> >
> > Rob Syme
> > PhD Student
> > ACNFP, Curtin University
> > Western Australia
> >
> >
> >
>
> _______________________________________________
> CD-HIT-l mailing list
> CD-HIT-l at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/cd-hit-l
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.bioinformatics.org/pipermail/cd-hit-l/attachments/20101112/0d3afce4/attachment.html>


More information about the CD-HIT-l mailing list