Biopython - Overview of BLAST

BLAST (Basic Local Alignment Search Tool) is one of the most powerful and widely used tools in bioinformatics. It is used to compare a query sequence (DNA, RNA, or protein) against a large database of biological sequences to find regions of similarity.

Biopython provides easy access to BLAST services through the Bio.Blast module, allowing researchers to run sequence similarity searches directly from Python.

In this tutorial, you will learn the fundamentals of BLAST, its types, and how Biopython interacts with BLAST systems.

What is BLAST?

BLAST is a sequence comparison tool developed by NCBI. It helps identify:

Similar DNA sequences
Related proteins
Gene functions
Evolutionary relationships
Potential homologs

It works by finding local alignments between sequences and ranking results based on similarity scores.

Why BLAST is Important?

BLAST is essential in bioinformatics because it helps answer questions like:

What gene does this DNA sequence belong to?
Is this protein similar to known proteins?
What organism does this sequence come from?
Does this sequence have mutations compared to known genes?

Types of BLAST

1. BLASTN (Nucleotide BLAST)

Compares DNA sequences
DNA vs DNA database

Example use:

Gene identification
Genome mapping

2. BLASTP (Protein BLAST)

Compares protein sequences
Protein vs protein database

Example use:

Protein function prediction

3. BLASTX

Converts DNA → protein
Searches protein database

4. TBLASTN

Protein query vs DNA database (translated)

5. TBLASTX

DNA translated → protein vs protein database

How BLAST Works

BLAST follows a step-by-step process:

Query Sequence
      ↓
Search Database
      ↓
Find Similar Regions
      ↓
Score Alignments
      ↓
Rank Results

It uses heuristics to quickly find local similarities instead of comparing entire sequences.

Installing Biopython for BLAST

Before using BLAST in Biopython:

pip install biopython

Importing BLAST Module

from Bio.Blast import NCBIWWW
from Bio.Blast import NCBIXML

These modules allow online BLAST searches and result parsing.

Running a Basic BLAST Search

You can submit a sequence to NCBI BLAST.

from Bio.Blast import NCBIWWW

result_handle = NCBIWWW.qblast(
    "blastn",
    "nt",
    "ATGCGATACGTT"
)

Understanding qblast()

Parameters:

Parameter	Description
blastn	Type of BLAST (DNA search)
nt	Database (nucleotide database)
sequence	Query DNA sequence

Saving BLAST Results

with open("blast_results.xml", "w") as out:
    out.write(result_handle.read())

BLAST results are stored in XML format.

Reading BLAST Results

result_handle = open("blast_results.xml")

blast_records = NCBIXML.parse(result_handle)

for record in blast_records:
    for alignment in record.alignments:
        print(alignment.title)

Understanding BLAST Output

BLAST results include:

Sequence alignment
Score
E-value
Query coverage
Identity percentage

What is E-value?

E-value indicates the number of expected matches by chance.

Lower E-value = better match

Example:

0.0 → perfect match
1e-50 → highly significant
0.1 → weak similarity

BLAST Alignment Components

Each BLAST result contains:

Query sequence
Subject sequence
Alignment score
Gaps
Identity percentage

Example BLAST Workflow

from Bio.Blast import NCBIWWW

sequence = "ATGCGATACGTT"

result = NCBIWWW.qblast("blastn", "nt", sequence)

with open("output.xml", "w") as f:
    f.write(result.read())

Parsing BLAST Results

from Bio.Blast import NCBIXML

with open("output.xml") as result_handle:
    blast_records = NCBIXML.parse(result_handle)

    for record in blast_records:
        for alignment in record.alignments:
            for hsp in alignment.hsps:
                print("Score:", hsp.score)
                print("E-value:", hsp.expect)
                print("Identity:", hsp.identities)

What is HSP?

HSP (High Scoring Pair) represents aligned regions between sequences.

It contains:

Score
Identity
Alignment length
Gaps

Biological Applications of BLAST

BLAST is widely used in:

Genomics

Genome annotation
Gene identification

Medical Research

Disease gene detection
Mutation analysis

Evolutionary Biology

Homology detection
Phylogenetic analysis

Drug Discovery

Protein similarity analysis
Target identification

Advantages of BLAST in Biopython

Easy integration with Python
Access to NCBI databases
Automated sequence analysis
Supports multiple BLAST types
Useful for research and education

Limitations

Requires internet connection (NCBI BLAST)
Can be slow for large queries
Rate limits apply
Not suitable for offline large-scale analysis

Best Practices

Use Short Sequences for Testing

Start with small sequences before large datasets.

Save Results Locally

Avoid re-running BLAST unnecessarily.

Filter Results

Focus on low E-value and high identity hits.

Respect NCBI Usage Policy

Avoid excessive automated requests.

Real-World BLAST Example

sequence = "ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG"

result_handle = NCBIWWW.qblast(
    "blastn",
    "nt",
    sequence
)

print(result_handle.read())

BLAST vs Alignment

Feature	BLAST	Pairwise Alignment
Speed	Fast	Slower
Database	Yes	No
Purpose	Search similarity	Compare sequences
Scale	Large databases	Small comparisons

Conclusion

BLAST is a fundamental tool in bioinformatics for identifying sequence similarity and biological relationships. Biopython makes it easy to access BLAST services, retrieve results, and integrate sequence analysis into Python programs.

Understanding BLAST is essential for genome analysis, protein research, and modern computational biology workflows. In the next tutorial, we will explore how to work with Entrez databases to retrieve biological data from NCBI.

Header Ads Widget

Biopython BLAST Overview Tutorial: Search DNA & Protein Sequences Using Python