[Bio-Linux] Blasting Multiple Fasta Files

Tue May 5 10:54:59 EDT 2015

Hey,

blast+ is not parallelized all that well. Thus, you might want to try
GNU parallel to speed up your calculations somewhat, depending on your
machine. Here are some links:

https://www.biostars.org/p/63816/
https://www.biostars.org/p/76009/

Cheers,
Andreas

Andreas Leimbach
Universität Münster
Institut für Hygiene
Mendelstr. 7
D-48149 Münster
Germany

Tel.: +49 (0)551 39 33843
E-Mail: aleimba at gwdg.de

On 05.05.2015 16:31, Zain A Alvi wrote:
> Hi Marty,
> 
> I apologize for the confusion. I am splitting a fasta file that contains approximately 100,000 fasta sequences to 100 fasta files that contains 1000 sequences each.  I am hoping this will expedite the BLASTx process.
> 
> 
> Kind regards,
> 
> 
> Zain
> 
> ________________________________
> From: Martin Gollery <mgollery at unr.edu>
> Sent: Tuesday, May 5, 2015 10:23 AM
> To: Bio-Linux help and discussion
> Subject: Re: [Bio-Linux] Blasting Multiple Fasta Files
> 
> Running a million BLASTX jobs on one sequence each is not going to save you time. It is better to run one BLASTX job on a million sequences.
> 
> -Marty
> 
> 
> 
> On Tue, May 5, 2015 at 7:00 AM, Zain A Alvi <zain.alvi at student.shu.edu<mailto:zain.alvi at student.shu.edu>> wrote:
> 
> Dear Sir or Madam,
> 
> 
> I hope everything is well. I have downloaded all the viral protein sequences from the NCBI refseq database using their script from their E-book.  I have de-novo assembled some viral genomes and I know BLASTX takes a long time if the fasta is large.  I have been able to split the large fasta file based on an user specified contig number in each new fasta file.
> 
> 
> I was wondering is there a method to run BLASTX automatically on each of the fasta files one at a time so that it will be able to complete in a "shorter" amount of time as compared to BLASTing the whole large de-novo assembled fasta file.  Then I was hoping to concatenate all the results into one file.
> 
> 
> Sincerely,
> 
> 
> Zain
> 
> _______________________________________________
> Bio-Linux mailing list
> Bio-Linux at nebclists.nerc.ac.uk<mailto:Bio-Linux at nebclists.nerc.ac.uk>
> http://nebclists.nerc.ac.uk/mailman/listinfo/bio-linux
> 
> 
> 
> 
> --
> --
> Martin Gollery
> Senior Bioinformatics Scientist
> Tahoe Informatics
> www.bioinformaticist.biz<http://www.bioinformaticist.biz>
> www.hiddenmarkovmodels.com<http://www.hiddenmarkovmodels.com>
> 
> 
> 
> 
> _______________________________________________
> Bio-Linux mailing list
> Bio-Linux at nebclists.nerc.ac.uk
> http://nebclists.nerc.ac.uk/mailman/listinfo/bio-linux
>