WBO 11 – sequence motifs – regulatory genomics

Polish version below

Today in the lecture we were discussing sequence motids. Slides are here: wyk11-motifs-eng

Today’s practicals:

1. Download the promoter sequences from the E.coli genome: ecoli_proms.fa

2. Read about the Bio.motifs module and its implementation of position specific matrices in its documentation

3 . (Homework) Implement the simplified consensus method for motif finding (based on slide 20) and use it on the E.coli promoter sequences for motifs of length 3,4,5,6. The algorithm works as follows

based on any position in the first promoter sequence initialize a PWM
for every following step:
choose the position in one of the remaining sequences that is best matching the current PWM
modify your PWM, by adding the word at the chosen position to your PWM and remove this sequence from further considerations
Perform this procedure for all possible starting points in the first sequence and return a few (chosen by the user, by default 5) resulting PWMs that have the highest information content
Possible improvements can be considered: permuting the input sequence set and/or skipping some sequences that have a particularly bad match

4. Compare your results with the known E. coli promoter motifs

5. Use the MEME motif finding suite on your sequences and compare to your previous results and the real motifs

Homework: This week again you can earn additional 2 points for a homework. This time it is the assignment 3 (consensus method). As usual, please send it before the next week lecture as an attached python (.py) file by e-mail with “[WBO]” in the subject.

Polska wersja: Dziś rozmawialiśmy o reprezentacji i wyszukiwaniu motywów sekwencyjnych: Slajdy są tu: wyk11-motifs-eng

Zadania na dziś:

1. Pobierz listę promotorów e_coli w pliku fasta: ecoli_proms.fa

2. Zapoznaj się z modelem macierzy motywów z modułu Bio.motifs i jego dokumentacją

3. Zaimplementuj prostą, zachłanną metodę consensus dla wyszukiwania motywów, zastosuj ją do promotorów e.coli i obejrzyj wyniki dla motywów długości 3,4,5,6. Schemat algorytmu jest następujący:

zaczynając od dowolnego miejsca w pierwszej sekwencji skonstruuj na jego podstawie model PWM
dla każdej kolejnej sekwencji:
wybieraj sekwencję najlepiej pasującą do obecnego modelu PWM
Po wybraniu sekwencji skonstuuj nowy model PWM, wzbogacony o tę sekwencję
Po wykonaniu tej procedury dla wszystkich punktów startowych z sekwencji 1. zwróć kilka (np. 5) najlepszych (w sensie zawartości informacyjnej) macierzy

Powtórz tę procedurę dla kilku różnych permutacji pliku wejściowego.

4. Porównaj wyniki ze znanymi motywami promotorowymi w E. coli

5. Wykonaj analizę tych samych promotorów przy pomocy MEME i porównaj wyniki z otrzymanymi z consensusa

Praca domowa: W tym tygodniu można dostać 2 punkty za przesłanie do mnie swojej implementacji metody consensus. Jak zwykle mają Państwo tydzień czasu, proszę o wysłanie pliku .py mailem z kodem [WBO] w temacie.

Leave a Reply Cancel reply