Python for DNA studies
http://www.fish-evol.org/python4DNAanalyses.html 18 Oct. 2019

This website is a tutorial for my informal Python beginners' course. We will use materials of DNA analyses as follows:

- Text book: PYTHON FOR BIOLOGISTS (Jones 2013).
- Please download exercise files from the Dr. Jones' website.

I believe writing and running Python programs are more important than learning Python syntaxes. We do not need to remember all syntaxes. In this website, the following points are my recommendations:

1. exercise files, Exercises, and Solutions are important. Find ready-made programs in your downloaded exercise files at first.

2.
Make sure each variable by using the print function, "print()".

3. Read the main text of PYTHON FOR BIOLOGISTS (Jones 2013) if you have questions.

4. Get used to Google search to find syntaxes (or functions) of Python. I am using Google search almost every time when making a new program.


Chapter 1: Introduction and environment
Installing Python
We will install python3.
[Mac, Win]
Open the following website and click the latest link in the "Release version list":
https://www.python.org/downloads/
Scroll down and choose appropriate installer (below) for your PC.

[Mac]
Double click the downloaded
python-3.7.4-macosx10.9.pkg file and install python3.

[Win]
For 32bit version Windows, choose,

Windows x86 web-based installer.

For 64bit version,

Windows x86-64 web-based installer.

If you do not know which version is better, see https://support.microsoft.com/ja-jp/help/958406 (in Japanese)

Then install the package as below.


P9. Running Python programs

[Mac]
Terminal can be found in the following:

Applications > Utilities> Terminal.app

[Win]
Cygwin is useful for Windows. We can install Cygwin as follows:

Alternatively, we can use Command prompt. Find as below:

Program > Accessory > Command prompt

[Mac, Win]
Then make sure your installation of python3:

[inouejun:examples]$ which python3
/Library/Frameworks/Python.framework/Versions/3.7/bin/python3
[inouejun:examples]$ which python
/usr/bin/python
[inouejun:examples]$ python -V
Python 3.7.4

Text editors

[Mac, Win]
BBEdit (https://www.barebones.com/products/bbedit/). Click Free Download. I am using the free version usually. See tips for the use of BBEdit in my website (in Japanese).

[Win]
Notepad++.

[Mac, Win]
Text editors are very important for programming. "find and replace" functions make our tasks easy.

When saving text files, we need to take care of "line ending" for any command line work.

BBEdit

Notepad++

Edit > EOL Conversion


P14. Hello world

We will run hello_world.py from terminal.

cd into the following directory. Type,

$ cd xxxx/exercises and examples/printing_text/examples

We will use the following program:

hello_world.py

The underlined part can be obtained by the following step:

[Mac, Cygwin]
Drag and drop the examples icon (shown by red circle) as shown below.
[Command prompt]
Copy the address (underlined) as follows.

[Mac, Win]
In order to make sure that you are now in your printing_text/examples directory, type pwd:

[inouejunmp:examples]$ pwd
/Users/inouejunmp/Downloads/exercises and examples/printing_text/examples

ls command shows the content of directory:

[inouejun:examples]$ ls
comment.py print_length.py
....

[Mac, Win]
Open
hello_world.py by your editor as shown below.

Then type,

$ python3 hello_world.py

You will find the following output in your command line.

Hello world


Chapter 2: Printing and manipulating text

P39. Calculating AT content

At first, we will make sure that the program works.

cd
into the directory by typing the following command:

$ cd xxxx/exercises and examples/printing_text/exercises/

We will use the following program:

at_content.py

By opening your at_content.py file, make sure the content as below.

Then, run at_content.py:

$ python3 at_content.py
AT content is 0.6851851851851852


How to study
As a first step of programing, we will just type each line of program with reference to the at_content.py file.

By using your editor (BBEdit/Notepad++), type the following line:

my_dna = ("ACTGATCGATTACGTATAGTATTTGCTATCATACATATATATCGATGCGTTCAT")

The sequence part, "ACTG..." can be copied from the above or the at_content.py file. Then save this file as test.py in your printing_text/exercises directory. Ignore the first line "from __future__ import division." This is for python2 (p11).

Now we will make sure the content of this variable,
my_dna, by using the print function. Type the print() function and run your text.py program:

[inouejun:exercises]$ cat test.py
my_dna = "ACTGATCGATTACGTATAGTATTTGCTATCATACATATATATCGATGCGTTCAT"
print("my_dna", my_dna)
[inouejun:exercises]$ python3 test.py
my_dna ACTGATCGATTACGTATAGTATTTGCTATCATACATATATATCGATGCGTTCAT


For the variable, see P21 "Storing strings in variables."

Next, type the following commands.

length = len(my_dna)

Again, we will make sure the content of length variable by using the print function.

For the len function, see P25 "Finding the length of a string." But, usually, we will use Google search to know programming functions.

For the .count function, see P31, "Counting and finding substrings."

exit() function is useful to make sure variables by stopping programs.



P42. Complementing DNA

cd into the directory:

$ cd xxxx/exercises and examples/printing_text/exercises/

We will use the following program:

complement_dna.py


Chapter 3: Reading and writing files
P67. Splitting genomic DNA

Change the directory as follows:

$ cd xxxx/exercises and examples/reading_files/exercises

We will use the following program:

genomic_dna.py


P68. Writing a FASTA file

We will use the following program:

writing_a_fasta_file.py


P72. Writing multiple FASTA file

We will use the following program:

writing_multiple_fasta_files.py


Application: Reading downloaded fasta file

At first, download human mitochondrial genome sequence from NCBI as below.
The sequence will be saved as sequence.fasta.txt in your Download directory. Move thie file to the reading_files/exercises directory.

Then modify genomic_dna.py to read the sequence.fasta.txt file and print the content in your screen.

Chapter 4: Lists and loops
split.py

cd into the directory by typing the following command:

$ cd xxxx/exercises and examples/lists_and_loops/examples/

We will use the following program:

split.py



loop.py

cd into the directory by typing the following command:

$ cd xxxx/exercises and examples/lists_and_loops/examples/

We will use the following program:

loop.py



P90. Processing DNA in a file

cd into the directory by typing the following command:

$ cd xxxx/exercises and examples/lists_and_loops/exercises/

We will use the following program:

remove_adapter.py



P93. Multiple exons from genomic DNA

We will use the following program:

write_coding_sequence.py



Loops using excel file outputs

We will use the following file:

lists_and_loops/exercises/exons.txt


Chapter 5: Writing our own functions
use_function.py

cd into the directory by typing the following command:

$ cd xxxx/exercises and examples/writing_functions/examples/

We will use the following program:

amino_acids2.py



two_arguments.py

cd into the directory by typing the following command:

$ cd xxxx/exercises and examples/writing_functions/examples/

We will use the following program:

two_arguments.py


P116. Percentage of amino acid residues, part one

cd into the directory by typing the following command:

$ cd xxxx/exercises and examples/writing_functions/exercises/

We will use the following program:

amino_acids2.py


Chapter 6: Conditional tests
if.py

cd into the directory by typing the following command:

$ cd xxxx/exercises and examples/conditional_tests/examples/

We will use the following program:

if.py



accessions_and.py

We will use the following program:

accessions_and.py



write_accessions_elif.py

We will use the following program:

write_accessions_elif.py


P135. Several species

cd into the directory by typing the following command:

$ cd xxxx/exercises and examples/conditional_tests/exercises/

We will use the following program:

several_species.py



P137. Length range

We will use the following program:

length_range.py


Small tips

List files

[Mac]
ls command show the content of the folder.

$ ls
coding_dna.txt sequences.fasta
genomic_dna.py writing_a_fasta_file.py

[Win]

dir command.

$ dir

Changing to the parent directory

$ cd ..

Showing full path of the current directory

$ pwd
xxxx/exercises and examples/reading_files/exercises

Opening file with your editor

$ open genomic_dna.py

Scanning images

command + shift + 4

 



Jones, M. 2013. Python for Biologists. Createspace Independent Pub. Amazon