BioPython — Tools for biological computation

https://biopython.org https://biopython.org/DIST/docs/api/Bio-module.html

`SeqIO`

read(file, format, alphabet=None): 1配列しか含まないファイルを読んで SeqRecord を返す
parse(file, format, alphabet=None): 複数の配列を含むファイルを読んで SeqRecord のイテレータを返す
index(filename, format, alphabet=None, key_function=None): id をキーとした辞書likeオブジェクトを返す

write(sequences, file, format)

to_dict(sequences, key_function=None)

convert(in_file, in_format, out_file, out_format, alphabet=None)

from Bio import SeqIO

with open('beer.fasta', 'r') as fin:
    for record in SeqIO.parse(fin, 'fasta'):
        print(record.id)
        print(record.seq)

`SeqRecord`

Seq とそのほかの情報をひとまとまりにしたクラス。

id
seq

name
description
dbxrefs
features
annotations
letter_annotations

>>> for record in SeqIO.parse(fin, 'fasta'):
...     print(record)
...
ID: gi|186972394|gb|EU490707.1|
Name: gi|186972394|gb|EU490707.1|
Description: gi|186972394|gb|EU490707.1| Selenipedium aequinoctiale maturase K (matK) gene, partial cds; chloroplast
Number of features: 0
Seq('ATTTTTTACGAACCTGTGGAAATTTTTGGTTATGACAATAAATCTAGTTTAGTA...GAA', SingleLetterAlphabet())
ID: gi|186972391|gb|ACC99454.1|
Name: gi|186972391|gb|ACC99454.1|
Description: gi|186972391|gb|ACC99454.1| maturase K [Scaphosepalum rapax]
Number of features: 0
Seq('IFYEPVEILGYDNKSSLVLVKRLITRMYQQKSLISSLNDSNQNEFWGHKNSFSS...EEE', SingleLetterAlphabet())

`Seq`

塩基配列・アミノ酸配列のクラス。標準 str とほとんど同じように扱えるほか、相補鎖とかなんとかを簡単に扱えるメソッドが備わっている。

complement(self)
reverse_complement(self)
transcribe(self)
back_transcribe(self)
translate(self, table="Standard", stop_symbol="*", to_stop=False, cds=False)
ungap(self, gap=None)

GenomeDiagram

https://biopython.org/DIST/docs/tutorial/Tutorial.html#sec328 https://biopython.org/DIST/docs/GenomeDiagram/userguide.pdf

Entrez

https://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc90

esearch

NCBI ESearch Utility

https://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc93

>>> from Bio import Entrez
>>> handle = Entrez.esearch(db="nucleotide", retmax=10, term="Opuntia")
>>> record = Entrez.read(handle)
>>> for item in record.items():
...     print(item)
...
(u'Count', '390')
(u'RetMax', '10')
(u'IdList', ['257359511', '283467266', '246905625', '246905624', '246655205', '246655204', '240253899', '240253897', '240253576', '240253574'])
(u'TranslationStack', [{u'Count': '200', u'Field': 'Organism', u'Term': '"Opuntia"[Organism]', u'Explode': 'Y'}, {u'Count': '390', u'Field': 'All Fields', u'Term': 'Opuntia[All Fields]', u'Explode': 'Y'}, 'OR', 'GROUP'])
(u'TranslationSet', [{u'To': '"Opuntia"[Organism] OR Opuntia[All Fields]', u'From': 'Opuntia'}])
(u'RetStart', '0')
(u'QueryTranslation', '"Opuntia"[Organism] OR Opuntia[All Fields]')

efetch

返り値は結果(XML)へのハンドル。:

from Bio import Entrez

handle = Entrez.efetch(db="nucleotide", id="186972394,186972394", rettype="fasta")
record = SeqIO.parse(handle, "fasta")
for x in record:
    print(i)

Installation

pip で一発:

pip install biopython