0.1029

五煦查题

快速找到你需要的那道考题与答案

尔雅Bioinformatics- Introduction and Methods涓浗澶у棰樺簱闆舵蔼章节答案(学习通2023课后作业答案)

52 min read

尔雅Bioinformatics- Introduction and Methods涓浗澶у棰樺簱闆舵蔼章节答案(学习通2023课后作业答案)

1 Introduction and History of Bioinformatics

1 Introduction and History of Bioinformatics

1銆佷汉鍩哄洜缁勫ぇ灏忕害涓?尔雅 The size of human genome is:
A銆?.1*10^7 bp
B銆?.1*10^8 bp
C銆?.1*10^9 bp
D銆?.1*10^10 bp

2銆佷汉鍩哄洜缁勫ぇ绾︽湁澶氬皯鏄紪鐮佽泲鐧界殑鍩哄洜鍖洪棿: How many percent of the human genome is composed of the bases encoding proteins?
A銆?0%
B銆?0%
C銆?0%
D銆佷笉瓒?% less than 5%

3銆丟enbank鏁版嵁搴撳瓨鍌ㄧ殑鏁版嵁鏄粈涔?Genbank is a database for:
A銆佹牳閰稿簭鍒?nucleotide sequences
B銆佽泲鐧藉簭鍒?protein sequences
C銆佽泲鐧界粨鏋?protein structures
D銆佹牳閰哥粨鏋?nucleotide structures

4銆丼RA鏁版嵁搴撳瓨鍌ㄧ殑鏁版嵁鏄粈涔?SRA is a database for:
A銆佸瓨鍌ㄦ柊涓€浠f祴搴忔妧鏈殑鏁版嵁 next-generation-sequencing data
B銆佸瓨鍌⊿anger娴嬪簭鏁版嵁 Sanger sequencing data
C銆佸瓨鍌ㄥ熀鍥犺姱鐗囩殑鏁版嵁 Microarray data
D銆丯A 鏃犲叧閫夐」

5銆侀珮閫氶噺娴嬪簭閿欒鐜囧拰浼犵粺Sanger娴嬪簭鐩告瘮 Compared with the error rate of Sanger sequencing, the error rate of next-generation sequencing is
A銆侀珮 higher
B銆佷綆 lower
C銆佸樊涓嶅 more or less the same
D銆丯A 鏃犲叧閫夐」

6銆佺敓鐗╀俊鎭瀵规暟鎹殑澶勭悊涓€鑸槸涓€涓粈涔堟牱鐨勮繃绋?How are the data processed in a bioinformatics way?
A銆佹暟鎹鐞?鏁版嵁璁$畻-鏁版嵁鎸栨帢-寤虹珛妯″瀷/杩涜棰勬祴 Data management-data computation 鈥揹ata mining 鈥搈odeling/simulation
B銆佹暟鎹寲鎺?鏁版嵁绠$悊-寤虹珛妯″瀷/杩涜棰勬祴-鏁版嵁璁$畻 Data mining 鈥揹ata management鈥搈odeling/simulation -data computation
C銆佹暟鎹寲鎺?鏁版嵁绠$悊-鏁版嵁璁$畻-寤虹珛妯″瀷/杩涜棰勬祴 Data mining 鈥揹ata management鈥揹ata computation - modeling/simulation
D銆佸缓绔嬫ā鍨?杩涜棰勬祴-鏁版嵁鎸栨帢-鏁版嵁绠$悊-鏁版嵁璁$畻 Modeling/simulation-Data mining 鈥揹ata management鈥揹ata computation

7銆丼anger娴嬪簭鍝勾鍙戣〃 Sanger sequencing was established in锛?br>A銆?002
B銆?965
C銆?977
D銆?970

8銆佷汉鍩哄洜缁勮鍒掑摢骞村惎鍔?鍝竴骞村彂琛ㄨ崏鍥? The Human Genome Project was initiated in _ and the draft sequence was published in _.
A銆?977 2004
B銆?988 2004
C銆?977 2001
D銆?988 2001

9銆丳AM鎵撳垎鐭╅樀鏄负浠€涔堣璁$殑 PAM score matrix was designed for
A銆佹皑鍩洪吀鏇挎崲 amino acid substitution
B銆佹牳鑻烽吀鏇挎崲 nucleotide substitution
C銆丯A 鏃犲叧閫夐」
D銆丯A 鏃犲叧閫夐」

10銆佸熀浜庡姩鎬佽鍒掔殑搴忓垪鍖归厤绠楁硶鍝勾鍑虹幇锛?Sequence alignment algorithm based on dynamic programming was developed in锛?br>A銆?977
B銆?970
C銆?988
D銆?991

2 Sequence Alignment

2 Sequence Alignment

1銆佸湪浣跨敤鍔ㄦ€佽鍒掕繘琛屽簭鍒楁瘮瀵规椂锛屾瘮瀵圭粨鏋滄槸鍞竴鐨勫悧锛?Is the best result derived from dynamic programming unique when doing sequence alignment?
A銆佸敮涓€ Yes, it is unique
B銆佷笉鍞竴 No, sometimes it may have more than one best result.
C銆丯A 鏃犲叧閫夐」
D銆丯A 鏃犲叧閫夐」

2銆佸湪姘ㄥ熀閰哥殑搴忓垪姣斿缁撴灉涓紝涓€涓€滐細鈥濊〃绀轰粈涔堟剰鎬濓細 What鈥檚 the meaning of 鈥?鈥?in the 鈥渕arkup line鈥漮f amino acid sequence alignment
A銆佺浉鍚岀殑娈嬪熀 the same residues
B銆佷袱涓浉浼肩殑娈嬪熀姣斿鍒颁竴璧?the residues are similar
C銆佷袱涓笉鐩镐技鐨勬畫鍩烘瘮瀵瑰埌涓€璧?the residues are not similar
D銆佷竴涓┖浣?a gap

3銆佷竴绉嶆浛鎹㈠湪鑷劧鐣屼腑瓒婂鏄撳彂鐢燂紝鍒欒繖绉嶆浛鎹㈠湪鎵撳垎鐭╅樀涓搴旂殑鏁板€?The more likely it is for a substitution to happen in the natural world, the ____ its substitution score in the scoring matrix will be.
A銆佽秺澶?Larger
B銆佽秺灏?Smaller
C銆丯A 鏃犲叧閫夐」
D銆丯A 鏃犲叧閫夐」

4銆丟iven scoring matrix and gap penalty d=-5 use dynamic programming to do sequence alignment for this two sequences: AAGT and AGCT : If using global alignment, the value of the yellow block is ______ 缁欏畾涓婇潰鐨勬墦鍒嗙煩闃靛拰绌轰綅缃氬垎d=-5锛屽埄鐢ㄥ姩鎬佽鍒掑涓嬮潰涓や釜搴忓垪杩涜姣斿锛?AAGT 鍜?AGCT 瀵逛簬鍏ㄥ眬姣斿锛?榛勮壊鏍煎瓙鐨勫€煎簲涓篲_____
A銆?
B銆?4
C銆?8
D銆?6

5銆両f using global alignment, the value of blue block in question 4 is ______ 瀵逛簬鍏ㄥ眬姣斿锛岀4棰樹腑钃濊壊鏍煎瓙鐨勫€间负______
A銆?
B銆?4
C銆?
D銆?8

6銆両f using global alignment, the final alignment(s) in question 4 is ______ 瀵逛簬鍏ㄥ眬姣斿锛岀4棰樼殑鏈€缁堟瘮瀵圭粨鏋滄槸______
A銆丄AG- T - AGCT
B銆丄AGT- - AGCT
C銆丄AGT AGCT
D銆丄AG- T A- GCT

7銆両f using local alignment, the value of green block in question 4 is ______ 濡傛灉杩涜灞€閮ㄦ瘮瀵癸紝閭d箞绗?棰樹腑缁胯壊鏍煎瓙鐨勫€间负______
A銆?
B銆?
C銆?
D銆?

8銆両f using local alignment, the value of blue block in question 4 is ______ 濡傛灉杩涜灞€閮ㄦ瘮瀵癸紝閭d箞绗?棰樹腑钃濊壊鏍煎瓙鐨勫€间负______
A銆?
B銆?
C銆?
D銆?

9銆両f using local alignment, the max value we can get from this alignment is 鍒╃敤灞€閮ㄥ尮閰嶏紝鎴戜滑鍙互鍦ㄨ繖涓尮閰嶈繃绋嬩腑寰楀埌鐨勬渶澶х殑鎵撳垎鏄細
A銆?
B銆?
C銆?
D銆?

3 Sequence Database Search

3 Sequence Database Search

1銆佸叧浜嶣LAST缁撴灉涓璄-value鐨勮娉曪紝浠ヤ笅涓嶆纭殑鏄?Which one of the following options is not correct with respect to the BLAST's E-value?
A銆佸畠鐨勫€煎湪鎺ヨ繎1鏃讹紝鏄嚑涔庡拰鐩稿簲鐨刾-value涓€鏍风殑 When it is near 1, it is nearly identical to its corresponding p-value.
B銆佸畠琛ㄧず浜嗙浉搴攈it鐨勫彲淇″害 It denotes how much we could trust its corresponding "hit" sequence
C銆佸畠鍙互澶т簬1 It could be larger than 1
D銆佸綋瀹冪‘瀹氱殑鏃跺€欙紝鐩稿簲鐨刾-value涔熸槸纭畾鐨?When it is fixed, the corresponding p-value for this E-value will be fixed as well.

2銆佷笅鍒楅€夐」涓紝鍝釜椤逛笉鑳藉府鍔〣LAST闄嶄綆鍋囬槼鎬э紵 Which of the following options cannot reduce the false positives of BLAST?
A銆佹彁鍓嶇粰鏁版嵁搴撳缓绱㈠紩 Build an index for the database ahead of time
B銆佷粠鏈€鍒濆鎵惧埌鐨刪it閲岄潰鍘绘帀涓€浜涢浂鏁g殑hit锛屽彧淇濈暀hit cluster Discard isolated hits and keep only those hits that can form hit clusters
C銆佸睆钄介噸澶嶆€х殑浣庡鏉傚害鍖哄煙 Masking the low-complexity regions
D銆佷娇鐢‥-value鏉ヨ瘎浼版瘮瀵圭殑缁熻鏄捐憲鎬?Use E-value to evaluate the statistical significance of alignments

3銆佷笅鍒楅€夐」涓紝鍝竴椤逛笉鑳藉府鍔〣LAST鎻愬崌璁$畻閫熷害锛燂紙娉ㄦ剰涓嶄竴瀹氶潪寰楁槸鍜屼互鍓嶇殑鍙屽簭鍒楁瘮瀵圭畻娉曠浉姣旀湁鏄捐憲鎻愬崌锛?Which one of the following options cannot improve the speed of BLAST? Note that the improvement need not to be significant compared to previous pairwise sequence alignment algorithms.
A銆佸鏁版嵁搴撻鍏堝睆钄介噸澶嶆€х殑浣庡鏉傚害鍖哄煙 Masking the low-complexity regions of a database before using it in BLAST
B銆佷娇鐢ㄨ緝鐭殑seed word Use shorter seed words
C銆侀€夋嫨閭诲眳鍗曞瓧鏃讹紝鍙€夋嫨楂樺害鐩镐技鐨勯偦灞呭崟瀛?Choosing only those neighborhood words that are highly similar to the current seed word
D銆佹彁鍓嶇粰鏁版嵁搴撳缓绱㈠紩 Build an index for the database ahead of time

4銆丟iven the following protein sequence, please run BLAST, to find similar protein sequences: >Protein Sequence MVRAPCCEKMGLKKGPWTPEEDQILISYIQSNGHGNWRALPKLAGLLRCGKSCRLRWTNYLRPDIKRGNFTREEEDSIIQ LHEMLGNRWSAIAARLPGRTDNEIKNVWHTHLKKRLKNYQPPQSSKRHSKNKDSKAPCTSQIALKSSNNFSNIKEDGPGL GSGPNSPQLSSSEMSTVTADSLAVTMDISNSNDQIDSSENFIPEIDESFWTDGLSTSGGGEELQVQFPFHDMKQENVEKD VGAKLEDDMDFWYSVFIKSGDLLELPEF 鐜版湁濡備笅涓€鏉¤泲鐧藉簭鍒楋紝璇烽€氳繃BLAST锛屽鍏惰繘琛屽垎鏋愶紝瀵绘壘涓庡叾鐩镐技鐨勮泲鐧藉簭鍒楋細 BLAST锛歨ttp://blast.ncbi.nlm.nih.gov Parameters 鍙傛暟璁剧疆锛?Database: Non-redundant protein sequences (nr) Algorithm: blastp Word size: 3 Matrix: BLOSUM62 Gap Costs: Existence: 11 Extension: 1 Other parameters leave as default. 鍏朵粬鍙傛暟榛樿. Q: Which program listed in BLAST homepage should you use to do the analysis? Q: 涓轰簡瀹屾垚涓婅堪鍒嗘瀽锛屽簲閫夋嫨BLAST涓婚〉涓婄殑鍝釜绋嬪簭?
A銆乶ucleotide blast
B銆乸rotein blast
C銆乥lastx
D銆乼blastn

5銆両n BLAST result of question 4锛寃hich species has the highest similarity score? 鍦ㄧ4棰樼殑BLAST缁撴灉涓紝鎵€鑾峰緱鐨勭浉浼煎害鏈€楂樼殑搴忓垪鏉ヨ嚜浜庡摢涓墿绉?
A銆丆apsicum annuum 杈f
B銆丏atura metel 娲嬮噾鑺?br>C銆丳etunia x hybrida 鐭壍鐗?br>D銆丼olanum lycopersicum 鐣寗

4 Markov Model

4 Markov Model

1銆両n first-order Markov Models, the probability distribution of the current state depends and only depends on ______. 鍦ㄤ竴闃堕┈灏旂澶摼涓紝褰撳墠鐘舵€佺殑姒傜巼鍒嗗竷涓庝笖鍙笌______鏈夊叧.
A銆乤ll previous states 鍓嶉潰鎵€鏈夌姸鎬?br>B銆乮ts own state 鑷繁鐨勭姸鎬?br>C銆乼he immediate previous state 鍓嶄竴涓姸鎬?br>D銆乼he immediate next state 鍚庝竴涓姸鎬?br>
2銆乁se the Markov Model to perform two sequences alignment, giving the following state transition graph (M means two residues align together, X means there is a insertion in the first sequence, Y means there is a insertion in the second sequence) 鍒╃敤椹皵鍙か妯″瀷杩涜鍙屽簭鍒楁瘮瀵癸紝缁欏畾濡備笅鐘舵€佽浆绉诲浘锛圡琛ㄧず涓や釜娈嬪熀鍖归厤锛孹琛ㄧず绗竴鏉″簭鍒楀瓨鍦ㄤ竴涓彃鍏ワ紝Y琛ㄧず绗簩鏉″簭鍒楀瓨鍦ㄤ竴涓彃鍏ワ級 If the gap open probability is 0.1 and gap extension probalibity is 0.7 宸茬煡gap open鐨勬鐜囦负0.1锛実ap extension鐨勬鐜囦负0.7 What's the transition probability of the blue edge in the above graph? 鍒欎笂闈㈢姸鎬佽浆绉诲浘涓摑鑹茶竟涓婄殑鐘舵€佽浆绉绘鐜囧€间负______
A銆?.1
B銆?.2
C銆?.3
D銆?.8

3銆両n question 2, what's the transition probability of the green edge in the graph? 鍦ㄧ2棰樼殑鐘舵€佽浆绉诲浘涓紝缁胯壊杈逛笂鐨勭姸鎬佽浆绉绘鐜囧€间负______
A銆?.2
B銆?.3
C銆?.7
D銆?.8

4銆丟iven the following state transition graph and emission probabilities for each state 缁欏畾濡備笅鐘舵€佽浆绉诲叧绯诲拰姣忎釜鐘舵€佷笅鐨勭敓鎴愭鐜? What's the probability of observing "aabbc" through state transition path 1-2-2-3-3? 鍒欓€氳繃鐘舵€佽浆绉昏矾寰?-2-2-3-3瑙傛祴鍒板簭鍒梐abbc鐨勬鐜囨槸澶氬皯锛?br>A銆?.00072
B銆?.004068
C銆?.000144
D銆?.00336

5銆両n the following simple Hidden Markov Model of coding and noncoding region prediction, the left matrix is the log10 result of transition probability matrix and the right matrix is the log10 result of emission probability matrix(In the state transition probability matrix, the value in row i column j is the log10 probability from state i transfer to state j) 瀵逛簬涓嬮潰杩欎釜绠€鍗曠殑棰勬祴缂栫爜鍖洪潪缂栫爜鍖虹殑闅愰┈灏斿彲澶ā鍨嬶紝鍏朵腑涓や釜鐭╅樀鍒嗗埆涓虹姸鎬佽浆绉绘鐜囩煩闃靛拰鐢熸垚姒傜巼鐭╅樀鍙杔og10鍚庣殑缁撴灉锛堝湪鐘舵€佽浆绉绘鐜囩煩闃典腑锛岀i琛岀j鍒楄〃绀轰粠鐘舵€乮杞Щ鍒扮姸鎬乯鐨勬鐜囧彇log10鍚庣殑缁撴灉锛? For the above model, using dynamic programming, predict coding region(c) and noncoding region(n) of sequence "CGAAAA" (the log10 initial probability of noncoding is -0.1 and log10 initial probability of coding is -0.7 ) 搴旂敤涓婅堪妯″瀷锛岄噰鐢ㄥ姩鎬佽鍒掔殑鏂规硶锛屽搴忓垪CGAAAA杩涜缂栫爜鍖?c)鍜岄潪缂栫爜鍖?n)棰勬祴 锛堝垵濮嬬姸鎬佷负闈炵紪鐮佸尯鐨勬鐜囧彇log10鍚庤涓?0.1锛屽垵濮嬬姸鎬佷负缂栫爜鍖虹殑姒傜巼鍙杔og10鍚庤涓?0.7锛夛細 Q: What's the value in the green block? Q: 鍒欑豢鑹叉牸瀛愮殑鍊间负澶氬皯锛?br>A銆?2.0
B銆?1.5
C銆?1.3
D銆?2.1

6銆両n question 5, what's the value in the blue block? 鍦ㄧ5棰樹腑锛岃摑鑹叉牸瀛愮殑鍊间负澶氬皯锛?br>A銆?2.1
B銆?1.5
C銆?0.7
D銆?2.0

7銆両n question 5, the end of this sequence (A) is identified as coding or noncoding? 鍦ㄧ5棰樹腑锛屽簭鍒楃殑灏鹃儴锛圓锛夎閴村畾鎴愮紪鐮佽繕鏄潪缂栫爜锛?br>A銆乧oding
B銆乶oncoding
C銆丯A 鏃犲叧閫夐」
D銆丯A 鏃犲叧閫夐」

5 Next Generation Sequencing (NGS) Mapping of Reads From Resequencing and Calling of Genetic Variants

5 Next Generation Sequencing (NGS) Mapping of Reads From Resequencing and Calling of Genetic Variants

1銆乄hich of the following is not high-throughput sequencing technology? 涓嬮潰鍝竴椤逛笉鏄珮閫氶噺娴嬪簭鎶€鏈紵锛?br>A銆?54
B銆丠iSeq
C銆丼OLiD
D銆丼anger sequencing

2銆乄hich one is the high-throughput sequencing technique developed first? 涓嬮潰鍝竴涓槸鏈€鏃╁嚭鐜扮殑楂橀€氶噺娴嬪簭鎶€鏈紵
A銆?54
B銆丼OLiD
C銆丠iSeq
D銆両on Torrent PGM

3銆丟iven a base with sequencing quality being 30, what is the probability of this base being erroneously sequenced? 褰撲竴涓⒈鍩虹殑娴嬪簭璐ㄩ噺涓?0鏃讹紝璇ョ⒈鍩虹殑閿欒鐜囨槸澶氬皯锛?br>A銆?/30
B銆?/300
C銆?/100
D銆?/1000

4銆丆ompared with Sanger Sequencing, what is/are the feature(s) of the high-throughput sequencing technique? 涓嶴anger娴嬪簭鐩告瘮锛岄珮閫氶噺娴嬪簭鎶€鏈湁鍝簺鐗圭偣锛?br>A銆丼horter lengths 闀垮害鐭?br>B銆丗aster sequencing 閫熷害蹇?br>C銆丩ower cost 鎴愭湰浣?br>D銆丠igher error rate 閿欒鐜囬珮

5銆丅y applying which one(s) of the following statements can we speed up reads mapping? 涓嬮潰鍝簺鏂规硶鍙互鐢ㄦ潵鍔犻€熷簭鍒楀洖甯栵紵
A銆丅uild hash index for the reference sequence 閲囩敤hash鐨勫姙娉曚负鍙傝€冨簭鍒楀缓绔嬬储寮?br>B銆丼tore the reference sequence in prefix tree 灏嗗弬鑰冨簭鍒楀瓨鍏ュ墠缂€鏍戜腑
C銆丼tore the reference sequence in suffix tree 灏嗗弬鑰冨簭鍒楀瓨鍏ュ悗缂€鏍戜腑
D銆乁se dynamic programming to find out the optimal location(s) of the short sequence in the reference sequence 閲囩敤鍔ㄦ€佽鍒掔殑鍔炴硶鏌ユ壘鐭簭鍒楀湪鍙傝€冨簭鍒椾腑鐨勬渶浼樹綅缃?br>
6 Functional Prediction of Genetic Variants

6 Functional Prediction of Genetic Variants

1銆乄hich of the following mutations is NOT single nucleotide variation? 涓嬮潰鍝竴涓笉灞炰簬缂栫爜鍖哄崟鏍歌嫹閰稿彉寮傜殑绉嶇被锛?br>A銆乫rame-shifting mutation 绉荤爜绐佸彉
B銆乻top gain mutation 缁堟瀵嗙爜瀛愯幏寰?br>C銆乶onsynonymous mutation 闈炲悓涔夌獊鍙?br>D銆乻ynonymous mutation 鍚屼箟绐佸彉

2銆丠ow many single nucleotide variants are there in the genome of a human individual? 涓€涓汉鐨勫熀鍥犵粍涓ぇ绾︽湁澶氬皯鍗曟牳鑻烽吀鍙樺紓锛?br>A銆?0,000
B銆?00,000
C銆?,000,000
D銆?0,000,000

3銆乄hich of the following statements is true with respect to pathogenic and neutral mutations? 鍏充簬鑷寸梾绐佸彉鍜屼腑鎬х獊鍙橈紝涓嬮潰鍝竴椤规槸姝g‘鐨勶紵
A銆丄ll pathogenic mutations are in conserved regions, and all neutral mutations are in non-conserved regions. 鑷寸梾绐佸彉鍧囦綅浜庝繚瀹堝尯锛屼腑鎬х獊鍙樺潎浣嶄簬闈炰繚瀹堝尯
B銆丮utations that exist in patients AND never exist in healthy individuals must be pathogenic. 鍦ㄧ梾浜轰腑鍑虹幇涓斿湪鍋ュ悍浜轰腑娌℃湁鍑虹幇鐨勭獊鍙樹竴瀹氭槸鑷寸梾绐佸彉
C銆丄ll pathogenic mutations are in the functional sites of proteins, while all neutral mutations are in the non-functional sites are proteins. 鑷寸梾绐佸彉鍧囦綅浜庤泲鐧界殑鍔熻兘浣嶇偣锛屼腑鎬х獊鍙樺潎浣嶄簬铔嬬櫧鐨勯潪鍔熻兘浣嶇偣
D銆丅oth pathogenic and neutral mutations are curated by the dbSNP database. dbSNP鏁版嵁搴撲腑璁板綍鐨勭獊鍙樻棦鏈夎嚧鐥呯獊鍙樹篃鏈変腑鎬х獊鍙?br>
4銆丄 benchmark test of a prediction method gave the following statistics: 娴嬭瘯涓€涓娴嬫柟娉曠殑鎬ц兘锛岃瘯楠屽悗缁忚繃缁熻寰楀埌濡備笅鏁版嵁锛? What are the sensitivity, the specificity, and the accuracy for this prediction method, respectively? 鍒欒鏂规硶鐨勬晱鎰熸€?sensitivity)銆佺壒寮傛€?specificity)銆佸噯纭巼(accuracy)鍒嗗埆鏄灏戯紵
A銆?.80, 0.50, 0.90
B銆?.67, 0.25, 0.30
C銆?.80, 0.60, 0.70
D銆?.67, 0.50, 0.60

5銆乄hat's the FDR we can get from the data of Question 5? 绗簲棰樻暟鎹緱鍒扮殑FDR鏄灏戯紵
A銆?.33
B銆?.66
C銆?.25
D銆?.75

6銆乄hich of the algorithm is not design for predict the functional effects of nonsynonymous mutations 涓嬮潰鍝竴涓畻娉曚笉鏄敤鏉ラ娴嬮潪鍚屼箟绐佸彉瀵瑰姛鑳界殑褰卞搷
A銆丼IFT
B銆丳olyPhen
C銆丼APRED
D銆丅owtie

7銆両n which one(s) of the following databases should we search for a known pathogenic mutation on a gene? 涓烘壘鍒颁竴涓熀鍥犱笂鐨勫凡鐭ョ殑鑷寸梾绐佸彉锛屽彲浠ヤ粠涓嬮潰鍝釜(浜?鏁版嵁搴撲腑鏌ユ壘锛?br>A銆丩SDBs
B銆丠GMD
C銆?000 Genomes dataset
D銆丱MIM

7 Next Generation Sequencing Transcriptome Analysis, and RNA-Seq

7 Next Generation Sequencing Transcriptome Analysis, and RNA-Seq

1銆乄hich of the following statement is not an application of RNA-Seq? 涓嬮潰鍝竴椤逛笉鏄疪NA-Seq鐨勫簲鐢紵
A銆両dentify transcripts 閴村畾杞綍鏈?br>B銆丒stimate gene expression levels 纭畾鍩哄洜琛ㄨ揪閲?br>C銆丗ind out exactly the copy number of each transcript 鍑嗙‘纭畾姣忎釜杞綍鏈殑鎷疯礉鏁?br>D銆丗ind out differentially expressed genes 瀵绘壘宸紓琛ㄨ揪鍩哄洜

2銆乄hich of the following statements is wrong about RNA-Seq? 鍏充簬RNA-Seq锛屼笅闈㈠摢涓€椤硅娉曟槸閿欒鐨勶紵
A銆丷NA-Seq cannot tell from which strand of the DNA each RNA is transcribed RNA-Seq鏃犳硶鍖哄垎姣忔潯RNA鏄浆褰曡嚜DNA鍙岄摼涓殑鍝竴鏉¢摼
B銆乀he efficiency and sensitivity of RNA-Seq depends on the sequencing depth RNA-Seq鐨勬娴嬫晥鐜囧拰鐏垫晱搴︿緷璧栦簬娴嬪簭鐨勬繁搴?br>C銆乁sing RPKM to estimate gene expression levels can avoid artifacts of gene expression analysis caused by transcript length and sequencing depth 浣跨敤RPKM鏉ヨ 閲忓熀鍥犺〃杈鹃噺鐨勫ぇ灏忓彲浠ユ帓闄よ浆褰曟湰闀垮害鍜屾祴搴忔繁搴﹀鍒嗘瀽琛ㄨ揪閲忕殑褰卞搷
D銆丷NA-Seq can be used to identify alternative splicing isoforms in transcriptome RNA-Seq鍙互鐢ㄦ潵閴村畾杞綍缁勪腑鐨勫彲鍙樺壀鍒囦綋

3銆丄ssume we use RNA-Seq to get 1,000 reads that are mapped back to a gene with a length of 10,000 bp. If the total number of reads mapped back to the genome is 10,000,000, then what is the expression level (in RPKM) of this gene? 閫氳繃RNA-Seq娴嬪緱鍥炲笘鍒版煇鍩哄洜涓婄殑搴忓垪鍏?,000鏉★紝璇ュ熀鍥犻暱搴︿负10,000bp锛?涓斿洖甯栧埌鏁翠釜鍩哄洜缁勭殑搴忓垪鍏?0,000,000鏉★紝鍒欒鍩哄洜鐨勮〃杈鹃噺鐢≧PKM琛¢噺搴旇鏄灏戯紵
A銆?
B銆?0
C銆?00
D銆?000

4銆丩et's sequence the human transcriptome. Assume that the total number of reads mapped back to the genome is 10,000,000, among which 20,000 are mapped to gene A and 40,000 are mapped to gene B. If gene A has a length of 1,000 bp and gene B has a length of 3,000 bp, then whose expression level is higher? 瀵逛汉鐨勮浆褰曠粍杩涜娴嬪簭锛屽洖甯栧埌鍩哄洜缁勪笂鐨勫簭鍒椾竴鍏辨湁10,000,000鏉★紝鍏朵腑浣嶄簬鍩哄洜A涓婄殑搴忓垪鏈?0,000鏉★紝浣嶄簬鍩哄洜B涓婄殑搴忓垪鏈?0,000鏉★紝鍩哄洜A鐨勯暱搴︿负1,000bp, 鍩哄洜B鐨勯暱搴︿负3,000bp锛屽垯A涓嶣鐨勮〃杈鹃噺鍝竴涓洿楂橈紵
A銆乀he expression level of A is higher. A鐨勮〃杈鹃噺鏇撮珮
B銆乀he expression level of B is higher. B鐨勮〃杈鹃噺鏇撮珮
C銆丄 and B have the same expression level. A鍜孊鐨勮〃杈鹃噺鐩稿悓
D銆両t cannot be determined. 鏃犳硶纭畾

5銆乄hich of the following statement is true with respect to the Join exon strategy used in reads mapping in RNA-Seq? 鍏充簬RNA-Seq涓簭鍒楀洖甯栫殑Join exon鏂规硶锛屼笅闈㈣娉曚腑姝g‘鐨勬槸
A銆両t can discover new exons. 璇ユ柟娉曡兘澶熷彂鐜版柊鐨勫鏄惧瓙
B銆両t can discover new splicing isoforms. 璇ユ柟娉曡兘澶熷彂鐜版柊鐨勫壀鍒囦綋
C銆両t can discover new genes. 璇ユ柟娉曡兘澶熷彂鐜版柊鐨勫熀鍥?br>D銆両t runs more slowly. 璇ユ柟娉曡繍琛岄€熷害杈冩參

6銆乄hich of the following statement is wrong with respect to the Split reads strategy used in reads mapping in RNA-Seq? 鍏充簬RNA-Seq涓簭鍒楀洖甯栫殑Split reads鏂规硶锛屼笅闈㈣娉曚腑閿欒鐨勬槸
A銆両t can discover new exons. 璇ユ柟娉曡兘澶熷彂鐜版柊鐨勫鏄惧瓙
B銆両t can discover new splicing isoforms. 璇ユ柟娉曡兘澶熷彂鐜版柊鐨勫壀鍒囦綋
C銆両t can discover new genes. 璇ユ柟娉曡兘澶熷彂鐜版柊鐨勫熀鍥?br>D銆両t runs more faster. 璇ユ柟娉曡繍琛岄€熷害杈冨揩

7銆丄ssume that the RNA-Seq reads are mapped back to part of a gene as shown below: 宸茬煡RNA-Seq娴嬪簭鏁版嵁鍥炲笘鍚庡湪鏌愪釜鍩哄洜鍖洪棿鐨勬儏鍐靛涓嬪浘鎵€绀猴細 Then what is the minimum number of transcripts this gene could have? 鍒欒鍩哄洜鑷冲皯鏈夊嚑绉嶈浆褰曟湰锛?br>A銆?
B銆?
C銆?
D銆?

8銆両n Question 7, what is the maximum number of transcripts this gene could have? Assume that all the transcripts of this gene have been sequenced. 鍦ㄧ7棰樹腑锛岃鍩哄洜鏈€澶氭湁澶氬皯涓浆褰曟湰锛?鍋囪鎵€鏈夎浆褰曟湰鍧囧凡琚祴鍒?
A銆?
B銆?
C銆?
D銆?

8 Prediction and Analysis of Noncoding RNA

8 Prediction and Analysis of Noncoding RNA

1銆乄hich of the following statements is correct with respect to long non-coding RNAs? 涓嬮潰鍏充簬闀块潪缂栫爜RNA锛坙ncRNA锛夌殑璇存硶锛屾纭殑鏄?br>A銆乀here are no exons and reading frames on lncRNAs. lncRNA涓婃病鏈夊鏄惧瓙鍜岃鐮佹
B銆乀he lncRNAs are byproducts of transcription and do not have any functions lncRNA鏄浆褰曠殑鍓骇鐗╋紝娌℃湁鍔熻兘
C銆丄ll lncRNAs do not have polyA tails lncRNA閮芥病鏈塸olyA灏惧反
D銆乂ariants on lncRNAs might lead to human disease lncRNA涓婄殑鍙樺紓鍙兘瀵艰嚧浜虹被鐤剧梾

2銆乄hich of the following statements is wrong with respect to functional prediction? 鍏充簬鍔熻兘棰勬祴锛屼笅闈㈣娉曢敊璇殑鏄?br>A銆丟enes that are differentially expressed are very likely to be functionally related 宸紓琛ㄨ揪鐨勫熀鍥犲緢鏈夊彲鑳藉湪鍔熻兘涓婂叿鏈夌浉鍏虫€?br>B銆丆o-expressed genes might have related functions 鍏辫〃杈剧殑鍩哄洜鍙兘鍏锋湁鐩稿叧鐨勫姛鑳?br>C銆丟enes with similar expression levels have related functions 琛ㄨ揪閲忕浉浼肩殑鍩哄洜鍏锋湁鐩稿叧鐨勫姛鑳?br>D銆丟enes with similar sequences might have identical functions 搴忓垪鐩镐技鐨勫熀鍥犲彲鑳藉叿鏈夌浉鍚岀殑鍔熻兘

3銆丄ssume that the probability that we make an error in each trial is 0.2. What is the probability that we make an error in at least one of all three trials? 宸茬煡涓€娆¤瘯楠屼腑鍑洪敊鐨勬鐜囨槸0.2锛屽垯鍦?娆¤瘯楠屼腑鑷冲皯鏈変竴娆″嚭閿欑殑姒傜巼鏄灏戯紵
A銆?.512
B銆?.008
C銆?.992
D銆?.488

4銆乄hich of the following statements is wrong with respect to non-coding RNAs? 涓嬮潰鍏充簬闈炵紪鐮丷NA鐨勮娉曚腑锛岄敊璇殑鏄?br>A銆丯on-coding RNAs can regulate gene expression 闈炵紪鐮丷NA鑳藉璋冩帶鍩哄洜琛ㄨ揪
B銆丯on-coding RNAs play a role in protein translation 闈炵紪鐮丷NA鍙備笌铔嬬櫧鐨勭炕璇?br>C銆丯on-coding RNAs must shorter than coding RNAs 闈炵紪鐮丷NA涓€瀹氭瘮缂栫爜RNA鏇寸煭
D銆丯on-coding RNAs can't have any ORF (open reading frame) 闈炵紪鐮丷NA涓竴瀹氫笉瀛樺湪ORF锛坥pen reading frame锛?br>
5銆乄hich of the following statements is wrong with respect to the identification of non-coding RNAs? 鍏充簬闈炵紪鐮丷NA鐨勯壌瀹氾紝涓嬮潰璇存硶閿欒鐨勬槸
A銆乄e can use the feature of RNA secondary structure to identify microRNAs microRNA鐨勯壌瀹氬彲浠ュ埄鐢≧NA鐨勪簩绾х粨鏋勭壒寰?br>B銆乄e can use information of sequence conservation to identify non-coding RNAs 鍙互鍒╃敤搴忓垪纰卞熀淇濆畧鎬т俊鎭壌瀹氶潪缂栫爜RNA
C銆両t is impossible to identify non-coding RNAs using only those features that come from the sequence itself 浠呭埄鐢ㄥ簭鍒楁湰韬殑鐗规€ф棤娉曞疄鐜伴潪缂栫爜RNA鐨勯壌瀹?br>D銆丯ow we can identify all the non-coding RNAs using calculation method accurately 鎴戜滑鐜板湪鍙互鍒╃敤璁$畻鐨勬柟娉曟潵鍑嗙‘鐨勯壌瀹氭墍鏈夌殑闈炵紪鐮丷NAs

9 Ontology and Identification of Molecular Pathways

9 Ontology and Identification of Molecular Pathways

1銆乄hich is not the three categories of Gene Ontology? Gene Ontology鐨勪笁澶у垎绫讳笉鍖呮嫭涓嬮潰涓殑鍝竴椤癸紵
A銆丮olecular Function
B銆丅iological Process
C銆丆ellular Component
D銆丅iological Regulation

2銆乄hich one of the following inferences is right? 濡備笅鍝釜鎺ㄦ柇鏄纭殑锛?br>A銆両f A regulates B, and B is part of C, then A regulates C.
B銆両f A is a B, and B regulates C, then A is a C.
C銆両f A regulates B, and B regulates C, then A regulates C.
D銆両f A is part of B, and B regulates C, then A regulates C.

3銆乄hich of the following statements about KEGG ORTHOLOGY (KO) is right? 濡備笅鍝釜鍏充簬KEGG ORTHOLOGY (KO)鐨勯檲杩版槸姝g‘鐨勶紵
A銆乀he terms in the first level of KO includes Metabolism, Genetic Information Processing, and Cellular Processes. KO绗竴灞傜殑term鍖呮嫭Metabolism, Genetic Information Processing, and Cellular Processes
B銆乀he second level of KO contains ortholog groups (KO entries) KO绗簩灞傚惈鏈塷rtholog groups (KO entries)
C銆並O is a functional hierarchy of five levels KO鏄竴涓叿鏈変簲绾х殑鍔熻兘灞傛
D銆丄n ortholog group only contains orthologs ortholog group涓彧鍖呭惈ortholog

4銆乄hich following pathways do CCR5 (hsa:1234) belong to? (Hint: search from KEGG website http://www.genome.jp/kegg/) CCR5 (hsa:1234)灞炰簬濡備笅鍝簺鏉athway锛?(鎻愮ず锛氭悳绱EGG缃戠珯 http://www.genome.jp/kegg/)
A銆丆ytokine-cytokine receptor interaction
B銆丆hemokine signaling pathway
C銆乄nt signaling pathway
D銆乸53 signaling pathway

5銆乄hich following GO evidence codes do belong to computational analysis evidence? 濡備笅鍝簺GO璇佹嵁浠g爜灞炰簬璁$畻鍒嗘瀽璇佹嵁锛?br>A銆乻equence orthology (ISO)
B銆乲ey residues (IKR)
C銆乨irect assay (IDA)
D銆乪xpression pattern (IEP)

10 Bioinformatics Database and Software Resources

10 Bioinformatics Database and Software Resources

1銆乄hich of the following databases has each of its record describing a cluster of proteins, where in each cluster the proteins are highly similar with respect to their sequences? 涓嬮潰鍝釜鏁版嵁搴撹褰曠殑鏄竴缁勭粍铔嬬櫧璐紝姣忎竴缁勫唴鐨勫悇涓泲鐧借川搴忓垪楂樺害鐩镐技锛?br>A銆乁niProtKB
B銆乁niRef
C銆乁niParc
D銆乁niProt

2銆丗rom which of the following databases can we retrieve information about protein three-dimensional structure? 铔嬬櫧璐ㄤ笁缁寸粨鏋勬柟闈㈢殑鍏蜂綋淇℃伅鍦ㄤ笅闈㈠摢涓暟鎹簱涓湁锛?br>A銆丳DB
B銆丷fam
C銆丅ioPerl
D銆両ntAct

3銆乄hich of the following UCSC Genome Browser items is NOT represented as a track? 涓嬮潰鍝竴椤逛笉鏄疷CSC Genome Browser浠rack褰㈠紡鍛堢幇鐨勪俊鎭紵
A銆乀ranscript structure 杞綍鏈粨鏋?br>B銆丏NA methylation, histone modification, transcription factor binding site DNA鐢插熀鍖栵紝缁勮泲鐧戒慨楗帮紝杞綍鍥犲瓙缁撳悎浣嶇偣
C銆丆onservation and population variants 淇濆畧鎬у拰缇や綋涓瓨鍦ㄧ殑鍙樺紓
D銆丳rotein structure 铔嬬櫧缁撴瀯

4銆乄hich of the following tools is provided by UCSC Genome Browser? 涓嬮潰鍝竴椤规槸UCSC Genome Browser鎻愪緵鐨勫伐鍏凤紵
A銆丅LAST
B銆丅LAT
C銆丳rimer3
D銆丮EME Suite

5銆乄hat information has NCBI-Gene integrated and represented? NCBI-Gene鏁村悎銆佸憟鐜颁簡鍝簺淇℃伅锛?br>A銆丟ene names and Entrez Gene ID 鍩哄洜鍚嶃€丒ntrez GeneID
B銆丷efSeq record of gene product mRNA and proteins 鍩哄洜浜х墿mRNA銆佽泲鐧界殑RefSeq
C銆丳ossible functional domains of proteins 铔嬬櫧鍙兘鐨勭粨鏋勫煙
D銆丳henotypes and diseases related to a particular gene 鍩哄洜鐩稿叧鐨勮〃鍨嬨€佺柧鐥?br>
11 Origination of New Genes

11 Origination of New Genes

1銆乄hich of following computational methods cannot give useful information about a previously unknown de novo gene? 濡備笅鍝釜璁$畻鏂规硶涓嶈兘瀵逛竴涓箣鍓嶆湭鐭ョ殑浠庡ご璧锋簮鍩哄洜鎻愪緵鏈夌敤鐨勪俊鎭紵
A銆乵RNA expression pattern inferred from RNA-Seq data 浠嶳NA-Seq鏁版嵁寰楀埌鐨刴RNA琛ㄨ揪鐗圭偣
B銆丳rotein secondary structure prediction 铔嬬櫧浜岀骇缁撴瀯棰勬祴
C銆丳rotein physicochemical property (such as pI value) prediction 铔嬬櫧鐞嗗寲鎬ц川锛堝pI鍊硷級棰勬祴
D銆丠omology annotation to genes with known function 鍩轰簬宸茬煡鍔熻兘鍩哄洜鐨勫悓婧愭敞閲?br>
2銆乄hich one of following sequence alignment methods is not used for searching de novo genes? 濡備笅鍝釜搴忓垪姣斿鏂规硶娌℃湁琚敤鏉ュ鎵句粠澶磋捣婧愬熀鍥狅紵
A銆丅WA
B銆丟enome alignment
C銆丆odon-based alignment
D銆丅LAST

3銆丟iven the species phylogeny and the existence of genes in the figure, which two of the following inferences are right according to the parsimony principle? 缁欏畾鍥句腑鐨勭墿绉嶇郴缁熷彂鐢熷叧绯诲拰鍩哄洜鍦ㄥ悇鐗╃涓槸鍚﹀瓨鍦紝渚濇嵁鏈€绠€绾﹀師鍒欏涓嬪摢涓や釜鎺ㄦ柇鏄纭殑锛?
A銆丄BCD is a new gene, which originated after the divergence of Species 1 and 2 ABCD鏄竴涓湪鐗╃1鍜?鍒嗗矏鍚庤捣婧愮殑鏂板熀鍥?br>B銆丒FGH is an old gene, which has already existed in the ancestor of Species 1, 2, 3, 4, and 5 EFGH鏄竴涓湪鐗╃1锛?锛?锛?锛?鐨勭鍏堜腑灏卞凡瀛樺湪鐨勮€佸熀鍥?br>C銆両JKL originated 4 times independently in Species 2, 3, 4, 5 IJKL鍦ㄧ墿绉?锛?锛?锛?涓嫭绔嬪湴璧锋簮浜?娆?br>D銆丮NOP is a new gene, which originated after the divergence of Species 5 and the ancestor of Species 1, 2, 3, 4 MNOP鏄竴涓湪鐗╃5鍜岀墿绉?锛?锛?锛?鐨勭鍏堝垎宀愬悗璧锋簮鐨勬柊鍩哄洜

4銆丟iven the DNA sequence of an intact ORF, which two of following mutations will cause ORF disruption? 缁欏畾涓€涓畬鏁碠RF鐨凞NA搴忓垪锛屽涓嬪摢涓や釜绐佸彉浼氬鑷碠RF鐮村潖锛?br>A銆丄TG GGC CTG TCG ACC CTC G--CGG GAG CGA CTG TGA
B銆丄TG GGC CTG TAG ACC CTC GAG CGG GAG CGA CTG TGA
C銆丄TG GGC CTG TCG ACC CTC GA- --G GAG CGA CTG TGA
D銆丄TG GGC CTG TCG ACC CTC GAG CGG CAG CGA CTG TGA

5銆乄hich two of following statements about Pearson correlation coefficient (r) are right? 濡備笅鍝袱涓叧浜嶱earson鐩稿叧绯绘暟鐨勯檲杩版槸姝g‘鐨勶紵
A銆乺 is a measure of the linear correlation between two variables r搴﹂噺涓や釜鍙橀噺闂寸殑绾挎€х浉鍏虫€?br>B銆乺=1 means total positive correlation; r=0 means no linear correlation; and r=-1 means total negative correlation r=1浠h〃瀹屽叏姝g浉鍏筹紱r=0浠h〃娌℃湁绾挎€х浉鍏虫€э紱r=-1浠h〃瀹屽叏璐熺浉鍏?br>C銆乺=0.8 means 80% of variability is accounted for by the linear model r=0.8鎰忓懗鐫€80%鐨勫彉寮傚彲浠ヨ绾挎€фā鍨嬭В閲?br>D銆乺=0.99 always means the data is fitted well with a linear model r=0.99鎬绘剰鍛崇潃鏁版嵁琚嚎鎬фā鍨嬪緢濂芥嫙鍚?br>
Mid-term Exam

Mid-term Exam

1銆乁niprot is mainly a database for Uniprot鏁版嵁搴撲富瑕佸瓨鍌ㄧ殑鏁版嵁鏄粈涔?
A銆乸rotein sequence 铔嬬櫧搴忓垪
B銆乶ucleotide sequence 鏍搁吀搴忓垪
C銆乸rotein structure 铔嬬櫧缁撴瀯
D銆乶ucleotide structure 鏍搁吀缁撴瀯

2銆丳DB is mainly a database for: PDB鏁版嵁搴撲富瑕佸瓨鍌ㄧ殑鏁版嵁鏄粈涔?
A銆乸rotein structure 铔嬬櫧缁撴瀯
B銆丏NA structure DNA缁撴瀯
C銆丷NA structure 鏍搁吀缁撴瀯
D銆乬enetic diseases 閬椾紶鐤剧梾

3銆乀he idea of molecular evolutional clock was established in锛?鍒嗗瓙閽熺殑姒傚康鎻愬嚭浜庯細
A銆?950s
B銆?960s
C銆?970s
D銆?980s

4銆丳rotein sequencing technology was established ____ nucleotide sequencing technology: 铔嬬櫧娴嬪簭鎶€鏈痏鏍搁吀娴嬪簭鎶€鏈嚭鐜帮細
A銆乪arlier than 鏃╀簬
B銆乴ater than 鏅氫簬
C銆乤t the same time with 鍚屾椂浜?br>D銆丯A 鏃犲叧閫夐」

5銆乄hich of the algorithms or software is not designed for sequence alignment: 涓嬮潰鍝竴绉嶇畻娉曟垨杞欢涓嶆槸涓哄簭鍒楁瘮瀵硅璁$殑锛?br>A銆丯eedleman-Wunsch
B銆丼mith-Waterman
C銆丼IFT
D銆丅WA

6銆乄hy do we often assign negative scores to gaps in sequence alignment? 涓轰粈涔堢┖浣嶅湪姣斿杩囩▼涓竴鑸粰璐熷垎锛?br>A銆丅ecause gaps are uninformative. 鍥犱负绌轰綅娌℃湁淇℃伅鍚噺
B銆乀his assignment is arbitrarily determined without any proper reason. 杩欐槸绾﹀畾淇楁垚鐨勶紝娌℃湁鍚堥€傜殑鐞嗙敱
C銆丅ecause gaps surely change the 3D structure of sequences 鍥犱负绌轰綅涓€瀹氭敼鍙樹簡搴忓垪鐨勭┖闂寸粨鏋?br>D銆丅ecause insertions and deletions often affect the function of the sequence. 鍥犱负鎻掑叆鍒犻櫎閫氬父浼氬搴忓垪鍔熻兘浜х敓褰卞搷

7銆乄hich of the following statements can we DEDUCE from the fact that the substitution matrix in sequence alignment is symmetric with respect to its main diagonal? 搴忓垪姣斿涓殑鏇挎崲鐭╅樀鏄部涓诲瑙掔嚎瀵圭О鐨勭煩闃碉紝鐢辨鎴戜滑鑳藉寰楀埌涓嬮潰鍝竴涓粨璁猴紵
A銆丆hanging the direction of substitution does not change the alignment result. 鏇挎崲鏂瑰悜瀵规瘮瀵圭粨鏋滄病鏈夊奖鍝?br>B銆乀he substitution is context-free (i.e. not depending on bases or alignments before or after the substitution). 鏇挎崲鏄笂涓嬫枃鏃犲叧鐨?br>C銆乀he substitution matrix cannot describe insertions and deletions. 鏇挎崲鐭╅樀涓嶈兘鎻忚堪鎻掑叆鍒犻櫎
D銆丆hanging the direction of the two sequences does not change the alignment result. 鏇挎崲涓ゆ潯姣斿搴忓垪鏂瑰悜瀵规瘮瀵圭粨鏋滄病鏈夊奖鍝?br>
8銆乄hich of the following problems is NOT addressed by the pairwise sequence alignment algorithms introduced in this MOOC? 璇句笂璁茶堪鐨勫弻搴忓垪姣斿鏂规硶娌℃湁璇曞浘瑙e喅涓嬮潰鍝竴涓棶棰橈紵
A銆丗ind out globally optimal alignment between two sequences 鎵惧埌涓ゆ潯搴忓垪鏁翠綋涓婃渶浼樼殑姣斿
B銆丗ind out the alignment between the most similar parts of two sequences 鎵惧埌涓ゆ潯搴忓垪涔嬮棿鏈€鐩镐技閮ㄥ垎鐨勬瘮瀵?br>C銆乄hether two sequences have different functional domains 涓ゆ潯搴忓垪涓槸鍚﹀瓨鍦ㄤ笉鍚岀殑鍔熻兘鍩?br>D銆乄hether one sequence is part of another sequence 涓€鏉″簭鍒楁槸鍚︽槸鍙﹀涓€鏉″簭鍒楃殑涓€閮ㄥ垎

9銆乄hich of the following statements is NOT true with respect to the modification of Needleman-Wunsch by Smith-Waterman algorithm? 鍏充簬Smith-Waterman绠楁硶瀵筃eedleman-Wunsch绠楁硶鎵€鍋氱殑淇敼锛屼笅杩板摢涓€涓娉曚笉瀵癸紵
A銆乀he modification changes the number of values to compare in each maximization step from 3 to 4. 璇ヤ慨鏀逛娇寰楁瘡娆℃眰鏈€澶у€兼椂锛屼粠闇€瑕佹瘮杈冧笁鑰呭彉鎴愰渶瑕佹瘮杈冨洓鑰?br>B銆乀he modification makes the scoring of latter similar alignments not affected by former alignments that are not similar. 璇ヤ慨鏀逛娇寰楀悗缁浉浼肩殑姣斿鐨勬墦鍒嗕笉浼氬彈鍒板墠闈笉鐩镐技姣斿鐨勫奖鍝?br>C銆乀he modification does NOT increase or decrease the speed of the algorithm considerably. 璇ヤ慨鏀规病鏈夋槑鏄惧湴鍔犲揩鎴栧噺鎱㈢畻娉曠殑閫熷害
D銆乀he modification changes the number of optimal solutions from only one to one or more than one. 璇ヤ慨鏀逛娇寰楁渶浼樿В浠庡彧鑳芥湁涓€涓彉鎴愪簡鍙互鏈夊涓?br>
10銆丠ow many optimal alignments between the sequence "AAAAAA" and the sequence "A" can we get using Smith-Waterman algorithm with linear gap penalty? And how many optimal alignments can we get using Needleman-Wunsch algorithm with linear gap penalty? 浣跨敤Smith-Waterman绠楁硶骞堕噰鐢ㄧ嚎鎬х┖浣嶇綒鍒嗭紝搴忓垪AAAAAA鍜孉涓€鍏辨湁鍑犱釜鏈€浼樻瘮瀵癸紵鍒╃敤Needleman-Wunsch绠楁硶骞堕噰鐢ㄧ嚎鎬х┖浣嶇綒鍒嗙殑寰楀埌鐨勭粨鏋滃張鏄灏戯紵
A銆?, 6
B銆?, 2
C銆?, 1
D銆?, 6

11銆乄e can extend our pairwise sequence alignment to the case where we need to align three sequences at the same time. If we still use dynamic programming to align the sequences, then how many values should we compare in the maximization step of each recursion? 鎴戜滑鍙互鎶婂弻搴忓垪姣斿鎷撳睍鍒颁笁搴忓垪姣斿锛屼粛鐒堕噰鐢ㄥ姩鎬佽鍒掔殑绠楁硶銆傝闂鏃跺湪姣忎竴娆¤凯浠i噷姹傛渶澶у€兼椂锛岄渶瑕佹瘮杈冨灏戜釜鍊硷紵
A銆?
B銆?
C銆?
D銆?

12銆乀he time complexity of Needleman-Wunsch algorithm is O(m*n) for pairwise sequence alignment with one sequence of length m and the other of length n. For the three sequence alignment problem in Question 11, if their lengths are m, n, and k, respectively, then what is the time complexity? 瀵逛簬涓ゆ潯闀垮害鍒嗗埆涓簃鍜宯鐨勫簭鍒楄繘琛屽弻搴忓垪姣斿锛孨eedleman-Wunsch绠楁硶鐨勬椂闂村鏉傚害涓篛(m*n)銆傚浜?1棰樼殑涓夊簭鍒楁瘮瀵癸紝鑻ヤ笁鏉″簭鍒楅暱搴﹀垎鍒负m銆乶銆乲锛屽垯璇ョ畻娉曟椂闂村鏉傚害涓猴紵
A銆丱(m*n*k)
B銆丱(m*n + n*k + k*m)
C銆丱(max(m*n , n*k , k*m))
D銆丱(min(m*n , n*k , k*m))

13銆丟iven the following scoring matrix: and gap penalty d=-5, use dynamic programming to do global sequence alignment for these two sequences: TGAA and TCGA : The value of the blue block is ______ 缁欏畾涓婇潰鐨勬墦鍒嗙煩闃靛拰绌轰綅缃氬垎d=-5锛屽埄鐢ㄥ姩鎬佽鍒掑涓嬮潰涓や釜搴忓垪杩涜鍏ㄥ眬姣斿锛?TGAA 鍜?TCGA 钃濊壊鏍煎瓙鐨勫€煎簲涓篲_____
A銆?1
B銆?2
C銆?4
D銆?6

14銆乄hat's the disadvantage of currently used BLAST in comparison with Smith-Waterman algorithm? BLAST鐩告瘮Smith-Waterman绠楁硶锛岀己鐐规槸锛?br>A銆両t cannot handle cases where there are gaps. 鏃犳硶澶勭悊鏈夌┖浣嶇殑鎯呭喌
B銆両t cannot guarantee to find the globally optimal solution. 涓嶈兘淇濊瘉鎵惧埌鍏ㄥ眬鏈€浼樿В
C銆両t is always slightly slower. 閫熷害鎬绘槸绋嶆參
D銆両t cannot handle short query sequences such as primers 鏃犳硶澶勭悊鐭煡璇㈠簭鍒楀寮曠墿

15銆乄hich of the following sequences is NOT a suitable seed for BLAST? 涓嬭堪搴忓垪涓摢涓€涓笉閫傚悎鍋欱LAST鐨勭瀛愶紵
A銆丄GCTGC
B銆乀ACGAC
C銆丟CAGCT
D銆乀his depends on the query sequence and the database sequence. 杩欏彇鍐充簬鏌ヨ搴忓垪鍜屾暟鎹簱搴忓垪

16銆乄hich of the following statements is NOT correct, with respect to the standard used by BLAST to discard isolated hits? 鍏充簬BLAST鍘婚櫎闆舵暎hits鐨勬爣鍑嗭紝涓嬪垪璇存硶鍝竴涓笉瀵癸紵
A銆両solated hits can be parallel to the main diagonal, but cannot reside on the main diagnoal. 闆舵暎鐨刪its鍙互骞宠浜庝富瀵硅绾匡紝浣嗕笉鑳戒綅浜庝富瀵硅绾夸笂
B銆両solated hits can run in a direction not parallel to the main diagonal AND reside on places other than the diagonal AT THE SAME TIME. 闆舵暎鐨刪its鍙互涓嶅钩琛屼簬涓诲瑙掔嚎涔熶笉浣嶄簬涓诲瑙掔嚎涓?br>C銆両solated hits can reside on the main diagonal. They can also be parallel to the main diagonal. 闆舵暎鐨刪its鍙互浣嶄簬涓诲瑙掔嚎涓婏紝涔熷彲浠ュ钩琛屼簬涓诲瑙掔嚎
D銆丄ll the other three statements are NOT correct. 璇ラ€夐」浠ュ鐨勪笁绉嶈娉曢兘涓嶅

17銆乄hy does BLAST use dynamic programming to realign the sequence after extending the hit cluster? 涓轰粈涔圔LAST鍦ㄥ欢浼竓it cluster鍚庯紝杩樿閲嶆柊鐢ㄥ姩鎬佽鍒掔畻娉曞仛涓€娆℃瘮瀵癸紵
A銆乀o find out the optimal alignment of the query sequence to the region around the hit cluster 姹傚緱寤朵几鍚巋it cluster鎵€瑕嗙洊鑼冨洿鍐呭強鍛ㄥ洿鐨勬渶浼樻瘮瀵?br>B銆乀here's no reason, and this step is in fact optional. 娌℃湁涓轰粈涔堬紝杩欎竴姝ュ疄闄呬笂涓嶅仛涔熻
C銆乁sing dynamic programming, we can work out directly the p-value denoting the probability that this alignment is random. 閫氳繃鍔ㄦ€佽鍒掔畻娉曪紝鍙互鐩存帴绠楀嚭杩欎釜姣斿鏄殢鏈轰骇鐢熺殑p-value鍊?br>D銆丄ll the other three statements are NOT correct. 璇ラ€夐」涔嬪鍏朵綑涓夌璇存硶閮戒笉瀵?br>
18銆両f your protein sequence has 100 amino acids, then what is the expected number of perfect matches by chance if you search the Swiss-Prot database for this sequence? 濡傛灉浣犵殑铔嬬櫧搴忓垪闀垮害鏄?00涓皑鍩洪吀锛岄偅涔堜綘鍦⊿wiss-Prot鏁版嵁搴撲腑妫€绱㈠畠锛岄殢鏈烘儏鍐典笅鏈熸湜寰楀埌鐨勫畬缇庡尮閰嶆暟x鏄紵
A銆?0E-90 < x < 10E-80
B銆?0E-120 < x < 10E-110
C銆?0E-140 < x < 10E-130
D銆?0E-170 < x < 10E-160

19銆乄hich of the following statements is NOT correct with respect to BLAST E-value? 涓嬪垪鍏充簬BLAST E-value鐨勮娉曪紝涓嶆纭殑鏄?br>A銆乄hen E-value is larger than 1, it is not equal to its corresponding p-value E-value瓒呰繃1鏃讹紝p-value鍜孍-value涓嶇浉绛?br>B銆乄hen p-value is 0.05, the corresponding E-value is also 0.05 p-value涓?.05鏃讹紝鐩稿簲鐨凟-value涔熸槸0.05
C銆両f E-value is 3.42, then it means that there will be 3.42 alignments by chance whose scores are not lower than that of the current alignment. E-value濡傛灉鏄?.42锛岄偅灏辨剰鍛崇潃浼氭湁3.42涓殢鏈哄尮閰嶈幏寰椾笌褰撳墠姣斿鐩哥瓑鎴栨洿楂樼殑鍒嗘暟
D銆乀he longer the query sequence is, the larger the E-value will be. 杈撳叆搴忓垪瓒婇暱锛孍-value瓒婂ぇ

20銆乄hich of the following statements is NOT correct with respect to BLAST algorithm? 鍏充簬BLAST绠楁硶锛屼笅鍒楄娉曚笉姝g‘鐨勬槸
A銆乄e need to mask low-complexity regions to avoid false positives. 涓轰簡閬垮厤鍋囬槼鎬э紝闇€瑕佸睆钄戒綆澶嶆潅搴﹀尯鍩?br>B銆乄e need to sort the results by E-value to avoid false positives. 涓轰簡閬垮厤鍋囬槼鎬э紝闇€瑕佺敤E-value缁欑粨鏋滄帓搴?br>C銆乄e need to use longer seed words to find out all real matches 涓轰簡鎶婃墍鏈夊疄闄呬笂鐩哥鍚堢殑搴忓垪閮芥壘鍑烘潵锛岄渶瑕佺敤杈冮暱鐨勭瀛愬崟瀛?br>D銆乄e need to consider neighbor words that are similar to the seed word to find out all real matches 涓轰簡鎶婃墍鏈夊疄闄呬笂鐩哥鍚堢殑搴忓垪閮芥壘鍑烘潵锛岄渶瑕佽€冭檻鍜岀瀛愬崟瀛楃浉浼肩殑閭诲眳鍗曞瓧

21銆丟iven the following state transition graph and emission probabilities for each state, 缁欏畾濡備笅鐘舵€佽浆绉诲叧绯诲拰姣忎釜鐘舵€佷笅鐨勭敓鎴愭鐜囷紝 What's the probability of observing "abccc" through state transition path 1-2-2-2-3? 鍒欓€氳繃鐘舵€佽浆绉昏矾寰?-2-2-2-3瑙傛祴鍒板簭鍒梐bccc鐨勬鐜囨槸澶氬皯锛?br>A銆?.000072
B銆?.00072
C銆?.00336
D銆?.0036

22銆丠ow many possible state paths can emit the token sequence "cbba" AND start from state 1 AND end at state 3, given the state transition graph in Question 28? 鏍规嵁绗?8棰樼殑鐘舵€佽浆绉诲浘绀猴紝鍙兘浜х敓绗﹀彿搴忓垪cbba鐨勭敱1璧峰鐢?缁撴潫鐨勭姸鎬佸簭鍒椾竴鍏辨湁鍑犵锛?br>A銆?
B銆?
C銆?
D銆?

23銆丏oes HMM guarantee to find the state path that has the globally largest probability of occurrence? HMM鏄惁涓€瀹氳兘姹傚埌鍏ㄥ眬瑙掑害鍙戠敓姒傜巼鏈€澶х殑鐘舵€佽矾寰勶紵
A銆乊es 鏄?br>B銆丯o, but it can guarantee to find the state path that has the locally largest probability of occurrence 涓嶆槸锛屼絾鍙互淇濊瘉姹傚埌灞€閮ㄥ彂鐢熸鐜囨渶澶х殑鐘舵€佸簭鍒?br>C銆丯o, and it cannot guarantee to find out the state path that has the locally largest probability of occurrence either 涓嶆槸锛岃€屼笖涔熶笉鑳戒繚璇佹眰鍒板眬閮ㄥ彂鐢熸鐜囨渶澶х殑鐘舵€佸簭鍒?br>D銆丯A 鏃犲叧閫夐」

24銆乄hich of the follow is not correct if we do sequence alignment using HMM? 鍒╃敤HMM妯″瀷杩涜搴忓垪姣斿鏃讹紝浠ヤ笅鍝釜璇存硶涓嶆纭細
A銆丼tate X or Y means a gap 鐘舵€乆鎴朰瀵瑰簲涓€涓猤ap
B銆乀he time complexity is lower than dynamic programming 鍏舵椂闂村鏉傚害姣斿姩鎬佽鍒掕浣?br>C銆乄e can get an alignment with maximal probability 鍙互鑾峰緱姣斿缁撴灉涓鐜囨渶澶х殑
D銆丼tate M means a match or mismatch 鐘舵€丮瀵瑰簲涓€涓猰atch 鎴栬€卪ismatch

25銆乁sing the HMM model in 4.2 for sequencing alignment, given the probability from M to M being 0.8, the probability from X to X being 0.4, then the probability from M to X being ____, the probability form X to M is ____. 鍒╃敤4.2璇句欢涓墍鎻忚堪鐨凥MM妯″瀷杩涜搴忓垪姣斿锛屽凡鐭鎬佸埌M鎬佺殑姒傜巼涓?.8锛孹鍒癤鎬佺殑姒傜巼鏄?.4锛屽垯M鎬佸埌X鎬佸拰X鎬佸埌M鎬佺殑姒傜巼鍒嗗埆涓猴細
A銆?.1, 0.3
B銆?.1, 0.6
C銆?.2, 0.3
D銆?.2, 0.6

26銆丏o all bases of the same read have identical base qualities in high-throughput sequencing? 鍦ㄩ珮閫氶噺娴嬪簭涓紝涓€鏉ead涓婃瘡涓⒈鍩虹殑璐ㄩ噺閮界浉鍚屽悧锛?br>A銆乊es 鏄?br>B銆丯o 鍚?br>C銆丯A 鏃犲叧閫夐」
D銆丯A 鏃犲叧閫夐」

27銆丄ccording to the video lectures in week 5, if the ASCII code of base quality score 0 is noted as 鈥?鈥? what is the sequencing error probability p given base quality coded 鈥淎鈥? 鏍规嵁绗簲鍛ㄧ殑璇剧▼鍐呭锛屽鏋滄祴搴忚川閲忓垎鏁?瀵瑰簲鐨勬祴搴忚川閲廇SCII鐮佸€间负鈥?鈥濓紝閭d箞娴嬪簭璐ㄩ噺涓衡€淎鈥濈殑纰卞熀鐨勬祴搴忛敊璇鐜囨槸澶氬皯锛?br>A銆?.00631
B銆?.000631
C銆?.001262
D銆?.0001262

28銆両n reads mapping, a read is often divided into several non-overlapping fragments, the "seed" one(s) of which (i.e. one(s) that can be perfectly matched to the reference genomes) will be located first in the reference genome. If there are at most 3 variants in a read, then what is the minimum of segments we should divide the read into to guarantee to find a "seed"? 鍦ㄥ仛reads mapping涓紝閫氬父鎶婁竴涓猺ead鍒掑垎涓鸿嫢骞蹭笉閲嶅鐨勭墖娈碉紝鐒跺悗閫夊彇鍏朵腑鑳戒笌鍙傝€冨熀鍥犵粍瀹屽叏鍖归厤鐨勭墖娈?绉嶅瓙)棣栧厛瀹氫綅鍦ㄥ弬鑰冨熀鍥犵粍涓€傝嫢鍦ㄤ竴涓猺ead涓渶澶氫細鏈?涓彉寮傦紝鍒欒嚦灏戝簲灏嗚read鍒掑垎涓哄嚑涓墖娈典細淇濊瘉鑳芥壘鍒扮瀛愶紵
A銆?
B銆?
C銆?
D銆?

29銆乄hat important technique did Illumina (Solexa) develop, and is widely used today, to deal with very short reads? 涓轰簡鍏嬫湇璇婚暱杩囩煭鐨勯棶棰橈紝Illumina锛圫olexa锛夊紑鍙戜簡浣曠鑷充粖宸茬粡琚箍娉涘簲鐢ㄧ殑閲嶈鎶€鏈紵
A銆丳air-end sequencing 鍙岀娴嬪簭鎶€鏈?br>B銆丒mulsion PCR 涔虫恫PCR鎶€鏈?br>C銆丼equencing by synthesize 杈瑰悎鎴愯竟娴嬪簭鎶€鏈?br>D銆丼equencing by ligation 杩炴帴娴嬪簭鎶€鏈?br>
30銆丄ccording to the letcutes锛宯ext generation sequencing is often called deep sequencing mainly because ___ 鏍规嵁璁插骇鍐呭锛屼负浠€涔堟柊涓€浠f祴搴忕粡甯歌鍙仛娣卞害娴嬪簭锛?br>A銆乀he sequencing depth is deep 娴嬪簭娣卞害杈冩繁
B銆両t deepens our understandings in genomics 鍔犳繁浜嗘垜浠鍩哄洜缁勫鐨勭悊瑙?br>C銆両t sequences deep in to the tissue 娴嬪簭娣卞叆缁勭粐鍐呴儴
D銆丯one of the above 浠ヤ笂閫夐」鍧囦笉姝g‘

31銆丄ccording to the lecture, given a next generation sequencing result AAAa with average error rate 0.1, what are the probabilities of genotype (A,A), (A, a) and (a, a)? 渚濇嵁璇剧▼瑙嗛鍐呭锛屽湪鏂颁竴浠f祴搴忕粨鏋?AAAa 涓紝骞冲潎閿欒鐜?.1鐨勫墠鎻愪笅锛屼笁绉嶅熀鍥犲瀷(A, A), (A, a)鍜?a, a)鐨勬鐜囧垎鍒槸澶氬皯锛?br>A銆?.0729, 0.9262, 0.0009
B銆?.2916, 0.7048, 0.0036
C銆?.4, 0.596, 0.004
D銆?.1, 0.899, 0.001

32銆丏oes EVERY variant in the genome have some effect on phenotype? 鍩哄洜缁勪笂鐨勬墍鏈夊彉寮傞兘浼氬琛ㄥ瀷閫犳垚褰卞搷鍚楋紵
A銆乊es 鏄?br>B銆丯o 鍚?br>C銆丯A 鏃犲叧閫夐」
D銆丯A 鏃犲叧閫夐」

33銆両s it correct that variants only occurring in patients' genomes, but not in healthy people's genomes must be pathogenic? 鍙湪鐥呬汉鍩哄洜缁勪腑鍑虹幇鑰屼笉鍦ㄥ仴搴蜂汉鐨勫熀鍥犵粍涓嚭鐜扮殑鍙樺紓涓€瀹氭槸鑷寸梾鍙樺紓鍚楋紵
A銆乊es 鏄?br>B銆丯o 鍚?br>C銆丯A 鏃犲叧閫夐」
D銆丯A 鏃犲叧閫夐」

34銆乄hich of the following variants do not change the protein coded by the gene? 涓嬮潰鍝竴绉嶇被鍨嬬殑鍙樺紓涓€鑸笉浼氭敼鍙樺熀鍥犵紪鐮佺殑铔嬬櫧锛?br>A銆丯onsense mutation 鏃犱箟绐佸彉
B銆丼ynonymous mutation 鍚屼箟绐佸彉
C銆丯onsynonymous mutation 闈炲悓涔夌獊鍙?br>D銆丗rameshift mutation 绉荤爜绐佸彉

35銆丏oes OMIM only record variants from human genome? OMIM涓彧璁板綍浜嗕汉绫诲熀鍥犵粍涓殑鍙樺紓鍚楋紵
A銆乊es 鏄?br>B銆丯o 鍚?br>C銆丯A 鏃犲叧閫夐」
D銆丯A 鏃犲叧閫夐」

36銆丄re variants recorded in dbSNP all neutral? dbSNP涓褰曠殑鍙樺紓閮芥槸涓€х殑鍚?
A銆乊es 鏄?br>B銆丯o 鍚?br>C銆丯A 鏃犲叧閫夐」
D銆丯A 鏃犲叧閫夐」

37銆両s it correct that variants in conserved regions must lead to phenotypic changes? 浣嶄簬淇濆畧鍖虹殑鍙樺紓涓€瀹氫細閫犳垚琛ㄥ瀷鐨勬敼鍙樺悧锛?br>A銆乊es 鏄?br>B銆丯o 鍚?br>C銆丯A 鏃犲叧閫夐」
D銆丯A 鏃犲叧閫夐」

38銆丄re all disorder regions in the 3D protein structure nonfunctional? 铔嬬櫧涓夌淮缁撴瀯涓殑鏃犲簭鍖哄煙閮芥槸娌℃湁鍔熻兘鐨勫悧锛?br>A銆乊es 鏄?br>B銆丯o 鍚?br>C銆丯A 鏃犲叧閫夐」
D銆丯A 鏃犲叧閫夐」

39銆丄 benchmark test of a prediction method gave the following statistics: 娴嬭瘯涓€涓娴嬫柟娉曠殑鎬ц兘锛岃瘯楠屽悗缁忚繃缁熻寰楀埌濡備笅鏁版嵁锛? What's the false positive rate for this prediction method? 鍒欒鏂规硶鐨勫亣闃虫€х巼鏄灏戯紵
A銆?.2
B銆?.33
C銆?.4
D銆?.8

40銆両n Question 49, what is the precision of this method? 鍦ㄧ49棰樹腑锛岃鏂规硶鐨勭簿搴︽槸澶氬皯锛?br>A銆?.1
B銆?.33
C銆?.67
D銆?.8

41銆両n Question 13, if global alignment is used, the final alignment(s) is ______ 绗?3棰橈紝濡傛灉浣跨敤鍏ㄥ眬姣斿锛屽垯鏈€缁堟瘮瀵圭粨鏋滄槸______
A銆乀- GAA TCGA-
B銆乀GAA TCGA
C銆乀G-AA TCGA-
D銆乀- GAA TCG -A

42銆乄hich of the following statements is NOT correct with respect to BLAST algorithm? 鍏充簬BLAST绠楁硶锛屼笅鍒楄娉曚笉姝g‘鐨勬槸
A銆乄e should use shorter seed words to speed up BLAST 涓轰簡鎻愬崌BLAST閫熷害锛屽簲灏介噺浣跨敤鐭竴鐐圭殑绉嶅瓙鍗曞瓧
B銆乄e should use dynamic programming at the first stage in order to speed up BLAST 涓轰簡鎻愬崌BLAST閫熷害锛屽簲鍦ㄦ渶寮€濮嬪氨搴旂敤鍔ㄦ€佽鍒?br>C銆乄e should use fixed seed words in order to speed up BLAST 涓轰簡鎻愬崌BLAST閫熷害锛屽簲浣跨敤鍥哄畾鐨勭瀛愬崟瀛?br>D銆乄e should build indices for seed words in advance to speed up BLAST 涓轰簡鎻愬崌BLAST閫熷害锛屽簲鎻愬墠缁欑瀛愬崟瀛楀缓绱㈠紩

43銆乄hich of the following problems cannot be properly solved by BLAST? BLAST涓嶉€傚悎瑙e喅涓嬮潰鍝竴绉嶉棶棰橈紵
A銆丗ind out where the similar parts are between genomes from two human individuals 姣旇緝涓や釜浜虹殑鍩哄洜缁勫摢浜涢儴鍒嗘瘮杈冪浉浼?br>B銆丮ap NGS reads to a reference genome 鎶婃柊涓€浠f祴搴忕殑璇绘瀹氫綅鍦ㄥ弬鑰冨熀鍥犵粍涓?br>C銆丗ind out possible mouse homologues of a human gene 缁欏嚭浜轰腑鏌愬熀鍥犲湪灏忛紶涓殑鍙兘鐨勫悓婧愬熀鍥?br>D銆丗ind out from which gene a newly discovered transcript is transcribed 鎵惧埌鏂板彂鐜扮殑鏌愯浆褰曟湰鍙兘鏉ヨ嚜浜庡摢涓€涓熀鍥?br>
44銆乄hich of the following is correct with respect to k-order Markov Chain? 鍏充簬k闃堕┈灏旂澶摼锛屼笅鍒楄娉曟纭殑鏄?br>A銆乀he probability distribution of the state at time m(m>k) is and only is determined by its preceding k states m锛坢>k锛夋椂鍒荤姸鎬佺殑姒傜巼鍒嗗竷鐢变笖浠呯敱鍓峩涓椂鍒荤殑鐘舵€佸喅瀹?br>B銆乀he probability distribution of the state at time k is and only is determined by its immediate preceding state k鏃跺埢鐘舵€佺殑姒傜巼鍒嗗竷鐢变笖浠呯敱鍓嶄竴涓椂鍒荤殑鐘舵€佸喅瀹?br>C銆乀he probability distribution of the state at time m(m>k) is independent of its succeeding k states m (m>k) 鏃跺埢鐘舵€佺殑姒傜巼鍒嗗竷涓庡悗k涓椂鍒荤殑鐘舵€佹棤鍏?br>D銆乀he probability distribution of the state at time m(m>k) is and only is determined by its succeeding k states m锛坢>k锛夋椂鍒荤姸鎬佺殑姒傜巼鍒嗗竷鐢变笖浠呯敱鍚巏涓椂鍒荤殑鐘舵€佸喅瀹?br>
45銆乄hich of the following is NOT correct with respect to Hidden Markov Model (HMM) ? 涓嬪垪鍏充簬闅愰┈灏旂澶ā鍨嬬殑璇存硶涓嶆纭殑鏄細
A銆丠MM can not be used to do multiple sequence alignment 涓嶈兘鐢ㄤ綔澶氬簭鍒楁瘮瀵?br>B銆乀o use HMM, we must ensure differences between the emission probabilities of each state 浣跨敤鏃跺繀椤讳繚璇佸悇鎬佷箣闂村瓨鍦ㄧ敓鎴愭鐜囦笂鐨勫樊寮?br>C銆丠MM can not be used to predict intron HMM涓嶅彲浠ョ敤鏉ラ娴嬪唴鍚瓙
D銆丠MM can be used to predict protein functional domain. 鍙互鐢ㄦ潵棰勬祴铔嬬櫧鍔熻兘鍩?br>
46銆乄hich of the following state paths cannot be an optimal solution for the linear gap penalty introduced to the pairwise sequence alignment in this MOOC? (If you need additional information, please refer to the model in Slide 15 of the lecture in Week 4, Unit 1.) 涓嬮潰鍝竴绉嶇姸鎬佽矾寰勪笉鍙兘鏄涓婃墍璁茬殑鍙屽簭鍒楁瘮瀵圭嚎鎬х綒鍒嗙郴缁熼噷鐨勬渶浼樿В?(璇峰弬鐓eek 4锛?Unit 1锛岀15椤电殑妯″瀷)
A銆丮MMMMM
B銆丮MMMYX
C銆乆XXXXYMMM
D銆乊YMMMX

47銆丄ccording the lecture of week 4, which of the following statements is NOT correct with respect to transition probability matrix ? 渚濇嵁绗洓鍛ㄨ浠讹紝鍏充簬杞Щ姒傜巼鐭╅樀锛屼笅杩拌娉曚腑涓嶆纭殑鏄?br>A銆乀he sum of elements must be 1 for each column. 姣忓垪鍔犲拰涓€瀹氫负1
B銆乀he sum of elements must be 1 for each row. 姣忚鍔犲拰涓€瀹氫负1
C銆乀he matrix is not necessarily symmetric with respect to its main diagonal. 涓嶄竴瀹氭部涓诲瑙掔嚎瀵圭О
D銆乀he sum of all elements in the matrix must equal the number of observable states 鏁翠釜鐭╅樀鎵€鏈夊厓绱犱箣鍜屼竴瀹氫负鍙瀵熺姸鎬佹暟鐩殑鍊?br>
48銆丼anger sequencing technique is still widely used even after the development of high-throughput sequencing technology. Which of the following statements is/are the advantages of Sanger sequencing technique? 楂橀€氶噺娴嬪簭鎶€鏈嚭鐜板悗锛孲anger娴嬪簭鎶€鏈粛鏈夊箍娉涚殑搴旂敤锛屼笅闈㈠摢浜涙槸Sanger娴嬪簭鎶€鏈殑浼樺娍锛?br>A銆丩ower error rates 閿欒鐜囦綆
B銆丩ower costs 鎴愭湰浣?br>C銆丩onger reads 璇婚暱鏇撮暱
D銆丗aster sequencing 娴嬪簭閫熷害蹇?br>
49銆乄hich of the following statements is/are the application(s) of high-throughput sequencing technology? 涓嬮潰鍝簺鏄珮閫氶噺娴嬪簭鎶€鏈殑搴旂敤锛?br>A銆丏NA variant analysis DNA鍙樺紓鍒嗘瀽
B銆乀ranscriptome analysis 杞綍缁勫垎鏋?br>C銆丏NA modification analysis DNA淇グ鍒嗘瀽
D銆丏NA-protein interaction analysis DNA-铔嬬櫧鐩镐簰浣滅敤鍒嗘瀽

50銆乄hich of the following database(s) can be used to find out a known pathogenic mutation in a gene? 涓烘壘鍒颁竴涓熀鍥犱笂宸茬煡鐨勮嚧鐥呯獊鍙橈紝鍙互鎼滅储涓嬮潰鍝簺鏁版嵁搴擄紵
A銆乨bSNP 鍗曟牳鑻烽吀澶氭€佹€ф暟鎹簱
B銆?000 Genomes dataset 鍗冧汉鍩哄洜缁勬暟鎹?br>C銆丱MIM 浜虹被瀛熷痉灏旈仐浼犲湪绾挎暟鎹簱
D銆丠GMD 浜虹被鍩哄洜绐佸彉鏁版嵁搴?br>
Final Exam

Final Exam

1銆佷綘鎯虫煡璇竴涓凡鐭ョ殑铔嬬櫧璐ㄧ殑涓夌淮缁撴瀯鏄惁宸茬粡琚В鏋愬嚭鏉ヤ簡锛屽簲璇ュ幓璁块棶鐨勬暟鎹簱鏄?To which of the following databases should you refer in order to find out whether a known protein has already had its 3D structure resolved?
A銆丳DB
B銆丷efSeq
C銆乨bSNP
D銆丅LAT

2銆佷互涓嬫祴搴忚川閲忎腑锛屼唬琛ㄦ祴搴忛敊璇巼鏈€浣庣殑鏄紙鍗曞瓧浠hred33褰㈠紡璁板綍锛?Which of the following qualities of sequencing denotes the lowest sequencing error rate?(single character recorded in phred33)
A銆?
B銆?5
C銆?0
D銆丠

3銆丅AM鏍煎紡涓笉鍖呮嫭鐨勪俊鎭湁鍝簺 Which of the following information is NOT included in BAM format?
A銆佽娈靛簭鍒?The sequence of the read
B銆佽娈电殑缁撴瀯淇℃伅 The structure information of the read
C銆佽娈电殑璐ㄩ噺 The quality of the read
D銆佽娈垫瘮瀵圭殑鏌撹壊浣撳悕瀛?The name of the chromosome of the read alignment

4銆侀珮閫氶噺娴嬪簭鎶€鏈殑搴忓垪鍥炲笘绠楁硶

学习通Bioinformatics- Introduction and Methods涓浗澶у棰樺簱闆舵蔼

Bioinformatics是生物信息学的英文名,是у生物学、计算机科学、涓数学等学科交叉的浗澶簱闆新兴学科。它研究利用计算机和数学方法处理、棰樺分析和理解生物学的舵蔼答案各种数据,从而加深对生物学和生命科学的章节认识。

本课程将从Bioinformatics的学习基本概念、基础知识入手,通课详细介绍Bioinformatics的后作相关领域、技术和方法。业答

一、尔雅Bioinformatics的у基本概念

Bioinformatics是生物学和信息科学的交叉学科,是涓运用计算机和信息技术等手段开展的生物信息学研究。

二、浗澶簱闆Bioinformatics的相关领域

Bioinformatics的相关领域包括:

  • 基因组学:研究生物体基因组的结构、组成、功能和演化。
  • 转录组学:研究基因在转录过程中的表达特征。
  • 蛋白质组学:研究生物体中蛋白质的结构、组成、功能和相互作用关系。
  • 代谢组学:研究生物体中代谢产物的组成和变化规律。

三、Bioinformatics的基础知识

Bioinformatics的基础知识包括:

  • 生物学基础知识:生物学基础知识是Bioinformatics的基础,主要包括生物学的基本概念、生物体结构和功能、生物进化等。
  • 计算机基础知识:计算机基础知识是Bioinformatics的关键,主要包括计算机的基本概念、计算机操作系统、计算机网络、数据库等。
  • 统计学基础知识:统计学基础知识是Bioinformatics的基础之一,主要包括统计学的基本概念、统计学方法、数据处理和分析等。

四、Bioinformatics的技术和方法

Bioinformatics的技术和方法包括:

  • 序列分析:通过DNA或蛋白质序列的比较、比对、聚类、进化分析等方法,研究基因的结构和功能、生物进化等。
  • 结构分析:通过蛋白质三维结构的比较、模拟、分析等方法,研究蛋白质的结构和功能、药物设计等。
  • 功能分析:通过基因的表达谱、蛋白质相互作用、代谢通路等数据的分析,研究基因的功能和调控机制等。
  • 系统生物学:通过整合生物学各级别的信息,构建生物体的系统模型,研究生命系统的组成、结构和调控机制。

五、总结

Bioinformatics是一个新兴的交叉学科,运用计算机和信息技术等手段开展生物信息学研究。Bioinformatics的相关领域包括基因组学、转录组学、蛋白质组学和代谢组学等,Bioinformatics的基础知识包括生物学基础知识、计算机基础知识和统计学基础知识,Bioinformatics的技术和方法包括序列分析、结构分析、功能分析和系统生物学等。Bioinformatics技术和方法的不断发展将推动生命科学和医学的快速发展。

学习通Bioinformatics- Introduction and Methods涓浗澶у棰樺簱闆舵蔼

Bioinformatics是生物信息学的英文名,是生物学、计算机科学、数学等学科交叉的新兴学科。它研究利用计算机和数学方法处理、分析和理解生物学的各种数据,从而加深对生物学和生命科学的认识。

本课程将从Bioinformatics的基本概念、基础知识入手,详细介绍Bioinformatics的相关领域、技术和方法。

一、Bioinformatics的基本概念

Bioinformatics是生物学和信息科学的交叉学科,是运用计算机和信息技术等手段开展的生物信息学研究。

二、Bioinformatics的相关领域

Bioinformatics的相关领域包括:

  • 基因组学:研究生物体基因组的结构、组成、功能和演化。
  • 转录组学:研究基因在转录过程中的表达特征。
  • 蛋白质组学:研究生物体中蛋白质的结构、组成、功能和相互作用关系。
  • 代谢组学:研究生物体中代谢产物的组成和变化规律。

三、Bioinformatics的基础知识

Bioinformatics的基础知识包括:

  • 生物学基础知识:生物学基础知识是Bioinformatics的基础,主要包括生物学的基本概念、生物体结构和功能、生物进化等。
  • 计算机基础知识:计算机基础知识是Bioinformatics的关键,主要包括计算机的基本概念、计算机操作系统、计算机网络、数据库等。
  • 统计学基础知识:统计学基础知识是Bioinformatics的基础之一,主要包括统计学的基本概念、统计学方法、数据处理和分析等。

四、Bioinformatics的技术和方法

Bioinformatics的技术和方法包括:

  • 序列分析:通过DNA或蛋白质序列的比较、比对、聚类、进化分析等方法,研究基因的结构和功能、生物进化等。
  • 结构分析:通过蛋白质三维结构的比较、模拟、分析等方法,研究蛋白质的结构和功能、药物设计等。
  • 功能分析:通过基因的表达谱、蛋白质相互作用、代谢通路等数据的分析,研究基因的功能和调控机制等。
  • 系统生物学:通过整合生物学各级别的信息,构建生物体的系统模型,研究生命系统的组成、结构和调控机制。

五、总结

Bioinformatics是一个新兴的交叉学科,运用计算机和信息技术等手段开展生物信息学研究。Bioinformatics的相关领域包括基因组学、转录组学、蛋白质组学和代谢组学等,Bioinformatics的基础知识包括生物学基础知识、计算机基础知识和统计学基础知识,Bioinformatics的技术和方法包括序列分析、结构分析、功能分析和系统生物学等。Bioinformatics技术和方法的不断发展将推动生命科学和医学的快速发展。