Biopython - Overview of BLAST
BLAST (Basic Local Alignment Search Tool) is one of the most powerful and widely used tools in bioinformatics. It is used to compare a query sequence (DNA, RNA, or protein) against a large database of biological sequences to find regions of similarity.
Biopython provides easy access to BLAST services through the Bio.Blast module, allowing researchers to run sequence similarity searches directly from Python.
In this tutorial, you will learn the fundamentals of BLAST, its types, and how Biopython interacts with BLAST systems.
What is BLAST?
BLAST is a sequence comparison tool developed by NCBI. It helps identify:
- Similar DNA sequences
- Related proteins
- Gene functions
- Evolutionary relationships
- Potential homologs
It works by finding local alignments between sequences and ranking results based on similarity scores.
Why BLAST is Important?
BLAST is essential in bioinformatics because it helps answer questions like:
- What gene does this DNA sequence belong to?
- Is this protein similar to known proteins?
- What organism does this sequence come from?
- Does this sequence have mutations compared to known genes?
Types of BLAST
1. BLASTN (Nucleotide BLAST)
- Compares DNA sequences
- DNA vs DNA database
Example use:
- Gene identification
- Genome mapping
2. BLASTP (Protein BLAST)
- Compares protein sequences
- Protein vs protein database
Example use:
- Protein function prediction
3. BLASTX
- Converts DNA → protein
- Searches protein database
4. TBLASTN
- Protein query vs DNA database (translated)
5. TBLASTX
- DNA translated → protein vs protein database
How BLAST Works
BLAST follows a step-by-step process:
Query Sequence
↓
Search Database
↓
Find Similar Regions
↓
Score Alignments
↓
Rank ResultsIt uses heuristics to quickly find local similarities instead of comparing entire sequences.
Installing Biopython for BLAST
Before using BLAST in Biopython:
pip install biopythonImporting BLAST Module
from Bio.Blast import NCBIWWW
from Bio.Blast import NCBIXMLThese modules allow online BLAST searches and result parsing.
Running a Basic BLAST Search
You can submit a sequence to NCBI BLAST.
from Bio.Blast import NCBIWWW
result_handle = NCBIWWW.qblast(
"blastn",
"nt",
"ATGCGATACGTT"
)Understanding qblast()
Parameters:
| Parameter | Description |
|---|---|
| blastn | Type of BLAST (DNA search) |
| nt | Database (nucleotide database) |
| sequence | Query DNA sequence |
Saving BLAST Results
with open("blast_results.xml", "w") as out:
out.write(result_handle.read())BLAST results are stored in XML format.
Reading BLAST Results
result_handle = open("blast_results.xml")
blast_records = NCBIXML.parse(result_handle)
for record in blast_records:
for alignment in record.alignments:
print(alignment.title)Understanding BLAST Output
BLAST results include:
- Sequence alignment
- Score
- E-value
- Query coverage
- Identity percentage
What is E-value?
E-value indicates the number of expected matches by chance.
Lower E-value = better matchExample:
- 0.0 → perfect match
- 1e-50 → highly significant
- 0.1 → weak similarity
BLAST Alignment Components
Each BLAST result contains:
- Query sequence
- Subject sequence
- Alignment score
- Gaps
- Identity percentage
Example BLAST Workflow
from Bio.Blast import NCBIWWW
sequence = "ATGCGATACGTT"
result = NCBIWWW.qblast("blastn", "nt", sequence)
with open("output.xml", "w") as f:
f.write(result.read())Parsing BLAST Results
from Bio.Blast import NCBIXML
with open("output.xml") as result_handle:
blast_records = NCBIXML.parse(result_handle)
for record in blast_records:
for alignment in record.alignments:
for hsp in alignment.hsps:
print("Score:", hsp.score)
print("E-value:", hsp.expect)
print("Identity:", hsp.identities)What is HSP?
HSP (High Scoring Pair) represents aligned regions between sequences.
It contains:
- Score
- Identity
- Alignment length
- Gaps
Biological Applications of BLAST
BLAST is widely used in:
Genomics
- Genome annotation
- Gene identification
Medical Research
- Disease gene detection
- Mutation analysis
Evolutionary Biology
- Homology detection
- Phylogenetic analysis
Drug Discovery
- Protein similarity analysis
- Target identification
Advantages of BLAST in Biopython
- Easy integration with Python
- Access to NCBI databases
- Automated sequence analysis
- Supports multiple BLAST types
- Useful for research and education
Limitations
- Requires internet connection (NCBI BLAST)
- Can be slow for large queries
- Rate limits apply
- Not suitable for offline large-scale analysis
Best Practices
Use Short Sequences for Testing
Start with small sequences before large datasets.
Save Results Locally
Avoid re-running BLAST unnecessarily.
Filter Results
Focus on low E-value and high identity hits.
Respect NCBI Usage Policy
Avoid excessive automated requests.
Real-World BLAST Example
sequence = "ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG"
result_handle = NCBIWWW.qblast(
"blastn",
"nt",
sequence
)
print(result_handle.read())BLAST vs Alignment
| Feature | BLAST | Pairwise Alignment |
| Speed | Fast | Slower |
| Database | Yes | No |
| Purpose | Search similarity | Compare sequences |
| Scale | Large databases | Small comparisons |
Conclusion
BLAST is a fundamental tool in bioinformatics for identifying sequence similarity and biological relationships. Biopython makes it easy to access BLAST services, retrieve results, and integrate sequence analysis into Python programs.
Understanding BLAST is essential for genome analysis, protein research, and modern computational biology workflows. In the next tutorial, we will explore how to work with Entrez databases to retrieve biological data from NCBI.


0 Comments