{"id":1333,"date":"2018-01-25T23:04:18","date_gmt":"2018-01-25T21:04:18","guid":{"rendered":"https:\/\/bioinfo2.ugr.es\/bioinfo\/?page_id=1333"},"modified":"2018-01-25T23:04:29","modified_gmt":"2018-01-25T21:04:29","slug":"bondad-prediccion-de-genes","status":"publish","type":"page","link":"https:\/\/bioinfo2.ugr.es\/bioinfo\/bondad-prediccion-de-genes\/","title":{"rendered":"bondad predicci\u00f3n de genes"},"content":{"rendered":"<h3>El objetivo de esta pr\u00e1ctica ser\u00e1 el de cuantificar la bondad de la predicci\u00f3n del programa GeneMark.<\/h3>\n<p>Para guiarnos, nos fijaremos en la figura 1 de la siguiente publicaci\u00f3n: <a href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/8786136\" target=\"_blank\" rel=\"noopener\">Burset M1, Guig\u00f3 R. (1996) Evaluation of gene structure prediction programs.<\/a><\/p>\n<p><a href=\"https:\/\/bioinfo2.ugr.es\/secuencias\/wp-content\/uploads\/2014\/11\/guigo_1996.png\"><img loading=\"lazy\" class=\"alignnone size-full wp-image-1167\" src=\"https:\/\/bioinfo2.ugr.es\/secuencias\/wp-content\/uploads\/2014\/11\/guigo_1996.png\" alt=\"guigo_1996\" width=\"479\" height=\"453\" \/><\/a><\/p>\n<h2>Protocolo:<\/h2>\n<h3><strong>1) Descargar la secuencia del genoma completo de <em><a href=\"http:\/\/www.ncbi.nlm.nih.gov\/nuccore\/157843848?report=genbank\" target=\"_blank\" rel=\"noopener\">Rickettsia massiliae<\/a><\/em> que se usar\u00e1 para la predicci\u00f3n de genes. Cambiar el nombre de la secuencia a \u2018CP000683\u2019 y eliminar la descripci\u00f3n<\/strong><\/h3>\n<h3>2) usar el <a href=\"https:\/\/www.ncbi.nlm.nih.gov\/orffinder\">ORFfinder <\/a>para detectar ORFs sobre una subsecuencia<\/h3>\n<h3><strong>3) Descargar las secuencias de las CDS de los genes anotados de <em><a href=\"http:\/\/www.ncbi.nlm.nih.gov\/nuccore\/157843848?report=genbank\" target=\"_blank\" rel=\"noopener\">Rickettsia massiliae<\/a><\/em>. Estos datos se usar\u00e1n para obtener la anotaci\u00f3n de la secuencia con la que tenemos que comparar la predicci\u00f3n.<\/strong><\/h3>\n<h3><strong>4) Predecir los genes mediante el <a href=\"http:\/\/exon.gatech.edu\/GeneMark\/heuristic_gmhmmp.cgi\" target=\"_blank\" rel=\"noopener\">GeneMark <\/a>y guardar el resultado en formato \u2018<a href=\"https:\/\/genome.ucsc.edu\/FAQ\/FAQformat.html#format3\" target=\"_blank\" rel=\"noopener\">GFF<\/a>\u2019. Eliminar la \u00faltima l\u00ednea en blanco.<\/strong><\/h3>\n<h3><strong>5) Eliminar el encabezado (las primeras 9 l\u00edneas) del fichero en GFF (<a href=\"https:\/\/bioinfo2.ugr.es\/secuencias\/wp-content\/uploads\/2014\/11\/geneMark.txt\">resultado<\/a>).<\/strong><\/h3>\n<h3><strong>6) Obtener la anotaci\u00f3n<\/strong><\/h3>\n<h3>El fichero con las secuencias de las CDS contiene para cada secuencia una anotaci\u00f3n en formato fasta.<br \/>\n&gt;lcl|CP000683.1_cds_RMA_0001_1 [gene=RMA_0001] [protein=hypothetical protein] [protein_id=ABV84350.1] [location=1..822]<\/h3>\n<h3>El objetivo ser\u00e1 extraer las coordenadas dadas por el <em>tag<\/em> location=1..822. Mediante una l\u00ednea de comando:<\/h3>\n<h3 style=\"padding-left: 30px;\"><em>cat Rickettsia_cds.fa | awk &#8216;{ if($1 ~ \/&gt;\/){ if($0 ~ \/location=complement\/){ split($0,f,&#8221;location=complement\\\\(&#8220;); split(f[2],f2,&#8221;\\\\.\\\\.&#8221;); gsub(&#8220;]&#8221;, &#8220;&#8221;,f2[2]); gsub(&#8220;\\\\)&#8221;, &#8220;&#8221;,f2[2]); print &#8220;CP000683\\t&#8221;f2[1]-1&#8243;\\t&#8221;f2[2]&#8221;\\t-\\t0\\t-&#8220;; } else{ split($0,f,&#8221;location=&#8221;); split(f[2],f2,&#8221;\\\\.\\\\.&#8221;); gsub(&#8220;]&#8221;, &#8220;&#8221;,f2[2]); print &#8220;CP000683\\t&#8221;f2[1]&#8221;\\t&#8221;f2[2]&#8221;\\t-\\t0\\t+&#8221;;} }} &#8216; &gt; rickettsia_cds_coord.txt<\/em><\/h3>\n<h3>obtendremos una salida que consta de las <a href=\"https:\/\/bioinfo2.ugr.es\/secuencias\/wp-content\/uploads\/2014\/11\/rickettsia_cds_coord.txt\" target=\"_blank\" rel=\"noopener\">coordenadas en formato bed<\/a><\/h3>\n<p><span style=\"font-size: 14pt;\"><strong>&#8211;desglosando el comando\u00a0<\/strong><\/span><br \/>\n<strong>escribir los encabezados<\/strong><br \/>\ncat sequence-2.txt | awk &#8216;{ if($0 ~ \/&gt;\/) { print $0 } }&#8217;<\/p>\n<p><strong>escribir\u00a0los encabezados de las CDS de genes localizados en la hebra (-)<\/strong><br \/>\ncat sequence-2.txt | awk &#8216;{ if($0 ~ \/&gt;\/) { if($0 ~ \/location=complement\/ ){ print $0 } }}&#8217;<\/p>\n<p><strong>cortar la cadena de caracteres en &#8216;location=complement(&#8216;; Las coordenadas estaran en el indice 2 del array f (f[2])<\/strong><br \/>\ncat sequence-2.txt | awk &#8216;{ if($0 ~ \/&gt;\/) { if($0 ~ \/location=complement\/ ){ split($0,f,&#8221;location=complement\\\\(&#8220;); print f[2] } }}&#8217;<\/p>\n<p><strong>cortar las coordenadas en &#8216;..&#8217;;<\/strong><br \/>\ncat sequence-2.txt | awk &#8216;{ if($0 ~ \/&gt;\/) { if($0 ~ \/location=complement\/ ){ split($0,f,&#8221;location=complement\\\\(&#8220;); split(f[2],f1,&#8221;\\\\.\\\\.&#8221;) ; print f1[1]&#8221; &#8220;f1[2] ;} }}&#8217;<\/p>\n<p><strong>eliminar &#8216;([&#8216; y escribir las coordenadas en formato BED<\/strong><br \/>\ncat sequence-2.txt | awk &#8216;{ if($0 ~ \/&gt;\/) { if($0 ~ \/location=complement\/ ){ split($0,f,&#8221;location=complement\\\\(&#8220;); split(f[2],f1,&#8221;\\\\.\\\\.&#8221;) ; gsub(&#8220;\\\\)]&#8221;,&#8221;&#8221;,f1[2]); print &#8220;CP000683\\t&#8221;f1[1]-1&#8243;\\t&#8221;f1[2]&#8221;\\t-\\t0\\t-&#8221; ;} }}&#8217;<\/p>\n<p><strong>redireccionar la salida a un fichero<\/strong><br \/>\ncat sequence-2.txt | awk &#8216;{ if($0 ~ \/&gt;\/) { if($0 ~ \/location=complement\/ ){ split($0,f,&#8221;location=complement\\\\(&#8220;); split(f[2],f1,&#8221;\\\\.\\\\.&#8221;) ; gsub(&#8220;\\\\)]&#8221;,&#8221;&#8221;,f1[2]); print &#8220;CP000683\\t&#8221;f1[1]-1&#8243;\\t&#8221;f1[2]&#8221;\\t-\\t0\\t-&#8221; ;} }}&#8217; &gt; annotation_-.bed<br \/>\n<strong>ahora para la hebra (+)<\/strong><br \/>\ncat sequence-2.txt | awk &#8216;{ if($0 ~ \/&gt;\/) { if($0 ~ \/location=[0-9]\/ ){ split($0,f,&#8221;location=&#8221;); print f[2] } }}&#8217;<\/p>\n<p>cat sequence-2.txt | awk &#8216;{ if($0 ~ \/&gt;\/) { if($0 ~ \/location=[0-9]\/ ){ split($0,f,&#8221;location=&#8221;); split(f[2],f1,&#8221;\\\\.\\\\.&#8221;) ; gsub(&#8220;]&#8221;,&#8221;&#8221;,f1[2]); print &#8220;CP000683\\t&#8221;f1[1]-1&#8243;\\t&#8221;f1[2]&#8221;\\t-\\t0\\t+&#8221; ; } }}&#8217; &gt; annotation_+.txt<\/p>\n<p>&nbsp;<\/p>\n<p><strong>unir los dos ficheros<\/strong><\/p>\n<p>cat annotation_+.txt annotation_-.txt &gt; annotation.txt<\/p>\n<p>&nbsp;<\/p>\n<ul>\n<li><strong>Expresiones condicionales en awk<\/strong><\/li>\n<\/ul>\n<blockquote class=\"wp-embedded-content\" data-secret=\"pjEerwJyVf\"><p><a href=\"https:\/\/www.thegeekstuff.com\/2010\/02\/awk-conditional-statements\/\">4 Awk If Statement Examples ( if, if else, if else if, \ud83d\ude15 )<\/a><\/p><\/blockquote>\n<p><iframe class=\"wp-embedded-content\" sandbox=\"allow-scripts\" security=\"restricted\" style=\"position: absolute; clip: rect(1px, 1px, 1px, 1px);\" src=\"https:\/\/www.thegeekstuff.com\/2010\/02\/awk-conditional-statements\/embed\/#?secret=pjEerwJyVf\" data-secret=\"pjEerwJyVf\" width=\"600\" height=\"338\" title=\"&#8220;4 Awk If Statement Examples ( if, if else, if else if, :? )&#8221; &#8212; The Geek Stuff\" frameborder=\"0\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\"><\/iframe><\/p>\n<ul>\n<li><strong>funciones tipo &#8216;String&#8217; (cadena de caracteres)<\/strong><\/li>\n<\/ul>\n<p><a href=\"https:\/\/www.gnu.org\/software\/gawk\/manual\/html_node\/String-Functions.html\" target=\"_blank\" rel=\"noopener\">https:\/\/www.gnu.org\/software\/gawk\/manual\/html_node\/String-Functions.html<\/a><\/p>\n<ul>\n<li><strong>arrays en awk<\/strong><\/li>\n<\/ul>\n<p><a href=\"http:\/\/www.delorie.com\/gnu\/docs\/gawk\/gawk_119.html\" target=\"_blank\" rel=\"noopener\">http:\/\/www.delorie.com\/gnu\/docs\/gawk\/gawk_119.html<\/a><\/p>\n<p>&nbsp;<\/p>\n<h3><strong>7) Usar<a href=\"https:\/\/usegalaxy.org\/\" target=\"_blank\" rel=\"noopener\"> Galaxy<\/a> para comparar la predicci\u00f3n con la anotaci\u00f3n (<a href=\"https:\/\/wiki.galaxyproject.org\/PublicGalaxyServers\" target=\"_blank\" rel=\"noopener\">mirrors<\/a>)<\/strong><\/h3>\n<ul>\n<li>Subir los datos a Galaxy: Get Data &#8211;&gt; Upload File, especificando directamente los formatos BED (para la anotaci\u00f3n) y GFF para la predicci\u00f3n.<\/li>\n<li>Convertir GFF en BED: Herramienta l\u00e1piz &#8211;&gt; Convert Format<\/li>\n<li>Filtrar por hebra: <em>Filter and Sort &#8211;&gt; Filter <\/em>(c6 == &#8216;+&#8217; and c6 ==&#8217;-&#8216;)<\/li>\n<li>Comparar las listas: Operate on Genomic Intervals &#8211;&gt; Coverage<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<h1 style=\"text-align: center;\">\u00bfC\u00f3mo podemos calcular la sensibilidad y especificidad?<\/h1>\n","protected":false},"excerpt":{"rendered":"<p>El objetivo de esta pr\u00e1ctica ser\u00e1 el de cuantificar la bondad de la predicci\u00f3n del programa GeneMark. Para guiarnos, nos fijaremos en la figura 1 de la siguiente publicaci\u00f3n: Burset M1, Guig\u00f3 R. (1996) Evaluation of gene structure prediction programs. Protocolo: 1) Descargar la secuencia del genoma completo de Rickettsia massiliae que se usar\u00e1 para &hellip; <a href=\"https:\/\/bioinfo2.ugr.es\/bioinfo\/bondad-prediccion-de-genes\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":[],"_links":{"self":[{"href":"https:\/\/bioinfo2.ugr.es\/bioinfo\/wp-json\/wp\/v2\/pages\/1333"}],"collection":[{"href":"https:\/\/bioinfo2.ugr.es\/bioinfo\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/bioinfo2.ugr.es\/bioinfo\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/bioinfo2.ugr.es\/bioinfo\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/bioinfo2.ugr.es\/bioinfo\/wp-json\/wp\/v2\/comments?post=1333"}],"version-history":[{"count":2,"href":"https:\/\/bioinfo2.ugr.es\/bioinfo\/wp-json\/wp\/v2\/pages\/1333\/revisions"}],"predecessor-version":[{"id":1336,"href":"https:\/\/bioinfo2.ugr.es\/bioinfo\/wp-json\/wp\/v2\/pages\/1333\/revisions\/1336"}],"wp:attachment":[{"href":"https:\/\/bioinfo2.ugr.es\/bioinfo\/wp-json\/wp\/v2\/media?parent=1333"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}