This course is an introduction to state-of-the-arts concepts, methodologies and practices in computational biology. After taking this course, you should be able to:
Use appropriate commands or write your own scripts to manipulate your own genomic data
Exploit appropriate pipeline or software to analyze your own high-throughput gene expression microarray and/or RNA-Seq data and high-throughput sequencing transcriptional regulation data
Apply statistical or mathematical models to your own population/human genetics data
Criticize and/or extend the computational analysis or statistical models in the literature
Use R, Python, and/or Shell scripts and basic concepts in genomics and genetics to perform your own bioinformatics and computational biology research
今回のような検索クエリを入力した結果,イヌ(Canis lupus familiaris)とネコ(Felis catus)の分岐年代は55.1 Million Years Agoと推定された.ただしこの値というのは,様々な分岐年代の研究に関してイヌとネコが当てはまるものを複数集めてきて,最終的に平均を取った値となっている.個別の研究の推定結果というのは,図の左側の時系列の箇所に黒の点で表示されているように,実際には42Myaから67Myaまで幅があるようだ.このように複数のソースから分岐年代を確認できるのもTimeTreeの大きな特徴となっている.
$ kmergenie ecoli_ref-5m-trim.fastq
running histogram estimation
Linear estimation: ~130 M distinct 41-mers are in the reads
K-mer sampling: 1/100
| processing |
[going to estimate histograms for values of k: 61 51 41 31 21
-----------------------------------------------------------------------------------------------------------------------------Total time Wallclock 77.7066 s
fitting model to histograms to estimate best k
fitting histogram for k = 21
fitting histogram for k = 31
fitting histogram for k = 41
fitting histogram for k = 51
fitting histogram for k = 61
estimation of the best k so far: 51
refining estimation around [45; 57], with a step of 2
running histogram estimation
Linear estimation: ~139 M distinct 39-mers are in the reads
K-mer sampling: 1/100
| processing |
[going to estimate histograms for values of k: 57 55 53 51 49 47 45
-----------------------------------------------------------------------------------------------------------------------------Total time Wallclock 56.4315 s
fitting model to histograms to estimate best k
fitting histogram for k = 21
fitting histogram for k = 31
fitting histogram for k = 41
fitting histogram for k = 45
fitting histogram for k = 47
fitting histogram for k = 49
fitting histogram for k = 51
fitting histogram for k = 53
fitting histogram for k = 55
fitting histogram for k = 57
fitting histogram for k = 61
table of predicted num. of genomic k-mers: histograms.dat
best k: 55
$ kmergenie --diploid budgie.fastq
running histogram estimation
Linear estimation: ~3721 M distinct 46-mers are in the reads
K-mer sampling: 1/1000
| processing |
[going to estimate histograms for values of k: 71 61 51 41 31 21
---------------------------------------------------------------------------------------------------------------------------Total time Wallclock 3035.49 s
fitting model to histograms to estimate best k
fitting histogram for k = 21
fitting histogram for k = 31
fitting histogram for k = 41
fitting histogram for k = 51
fitting histogram for k = 61
fitting histogram for k = 71
estimation of the best k so far: 21
refining estimation around [15; 27], with a step of 2
running histogram estimation
Linear estimation: ~6335 M distinct 24-mers are in the reads
K-mer sampling: 1/1000
| processing |
[going to estimate histograms for values of k: 27 25 23 21 19 17 15
---------------------------------------------------------------------------------------------------------------------------Total time Wallclock 2305.61 s
fitting model to histograms to estimate best k
fitting histogram for k = 15
fitting histogram for k = 17
fitting histogram for k = 19
fitting histogram for k = 21
fitting histogram for k = 23
fitting histogram for k = 25
fitting histogram for k = 27
fitting histogram for k = 31
fitting histogram for k = 41
fitting histogram for k = 51
fitting histogram for k = 61
fitting histogram for k = 71
table of predicted num. of genomic k-mers: histograms.dat
best k: 21
$ kmergenie
KmerGenie
Usage:
kmergenie <read_file> [options]
Options:
--diploid use the diploid model
--one-pass skip the second pass
-k <value> largest k-mer size to consider (default: 121)
-l <value> smallest k-mer size to consider (default: 15)
-s <value> interval between consecurive kmer sizes (default: 10)
-e <value> k-mer sampling (default: auto-detected power of 10)"
-o <prefix> prefix of the output files (default: histograms)"