Heavy Watal

BioPython — Tools for biological computation

https://biopython.org https://biopython.org/DIST/docs/api/Bio-module.html

SeqIO

read(file, format, alphabet=None)
1配列しか含まないファイルを読んで SeqRecord を返す
parse(file, format, alphabet=None)
複数の配列を含むファイルを読んで SeqRecord のイテレータを返す
index(filename, format, alphabet=None, key_function=None)
id をキーとした辞書likeオブジェクトを返す

write(sequences, file, format)

to_dict(sequences, key_function=None)

convert(in_file, in_format, out_file, out_format, alphabet=None)

from Bio import SeqIO

with open('beer.fasta', 'r') as fin:
    for record in SeqIO.parse(fin, 'fasta'):
        print(record.id)
        print(record.seq)

SeqRecord

Seq とそのほかの情報をひとまとまりにしたクラス。

id
seq

name
description
dbxrefs
features
annotations
letter_annotations

>>> for record in SeqIO.parse(fin, 'fasta'):
...     print(record)
...
ID: gi|186972394|gb|EU490707.1|
Name: gi|186972394|gb|EU490707.1|
Description: gi|186972394|gb|EU490707.1| Selenipedium aequinoctiale maturase K (matK) gene, partial cds; chloroplast
Number of features: 0
Seq('ATTTTTTACGAACCTGTGGAAATTTTTGGTTATGACAATAAATCTAGTTTAGTA...GAA', SingleLetterAlphabet())
ID: gi|186972391|gb|ACC99454.1|
Name: gi|186972391|gb|ACC99454.1|
Description: gi|186972391|gb|ACC99454.1| maturase K [Scaphosepalum rapax]
Number of features: 0
Seq('IFYEPVEILGYDNKSSLVLVKRLITRMYQQKSLISSLNDSNQNEFWGHKNSFSS...EEE', SingleLetterAlphabet())

Seq

塩基配列・アミノ酸配列のクラス。 標準 str とほとんど同じように扱えるほか、 相補鎖とかなんとかを簡単に扱えるメソッドが備わっている。

GenomeDiagram

https://biopython.org/DIST/docs/tutorial/Tutorial.html#sec328 https://biopython.org/DIST/docs/GenomeDiagram/userguide.pdf

Entrez

esearch

efetch

返り値は結果(XML)へのハンドル。:

from Bio import Entrez

handle = Entrez.efetch(db="nucleotide", id="186972394,186972394", rettype="fasta")
record = SeqIO.parse(handle, "fasta")
for x in record:
    print(i)

Installation

pip で一発:

pip install biopython