|
| |
|
|
| [ 文章来源: | 文章作者:
| 发布时间:2007-07-12|
字体:
[大
中
小]
| |
|
CR), care should be taken not to include the cloning vector sequence when performing similarity searches. Plasmids, cosmids, phagemids, YACs and PACs are example types of cloning vectors. Cluster analysis(聚类分析) A method for grouping together a set of objects that are most similar from a larger group of related objects. The relationships are based on some criterion of similarity or difference. For sequences, a similarity or distance score or a statistical evaluation of those scores is used. Cobbler A single sequence that represents the most conserved regions in a multiple sequence alignment. The BLOCKS server uses the cobbler sequence to perform a database similarity search as a way to reach sequences that are more divergent than would be found using the single sequences in the alignment for searches. Coding system (neural networks) Regarding neural networks, a coding system needs to be designed for representing input and output. The level of success found when training the model will be partially dependent on the quality of the coding system chosen. Codon usageAnalysis of the codons used in a particular gene or organism. COG(直系同源簇) Clusters of orthologous groups in a set of groups of related sequences in microorganism and yeast (S. cerevisiae). These groups are found by whole proteome comparisons and include orthologs and paralogs. See also Orthologs and Paralogs. Comparative genomics(比较基因组学) A comparison of gene numbers, gene locations, and biological functions of genes in the genomes of diverse organisms, one objective being to identify groups of genes that play a unique biological role in a particular organism. Complexity (of an algorithm)(算法的复杂性) Describes the number of steps required by the algorithm to solve a problem as a function of the amount of data; for example, the length of sequences to be aligned. Conditional probability(条件概率) The probability of a particular result (or of a particular value of a variable) given one or more events or conditions (or values of other variables). Conservation (保守) Changes at a specific position of an amino acid or (less commonly, DNA) sequence that preserve the physico-chemical properties of the original residue. Consensus(一致序列) A single sequence that represents, at each subsequent position, the variation found within corresponding columns of a multiple sequence alignment. Context-free grammars A recursive set of production rules for generating patterns of strings. These consist of a set of terminal characters that are used to create strings, a set of nonterminal symbols that correspond to rules and act as placeholders for patterns that can be generated using terminal characters, a set of rules for replacing nonterminal symbols with terminal characters, and a start symbol. Contig (序列重叠群/拼接序列) A set of clones that can be assembled into a linear order. A DNA sequence that overlaps with another contig. The full set of overlapping sequences (contigs) can be put together to obtain the sequence for a long region of DNA that cannot be sequenced in one run in a sequencing assay. Important in genetic mapping at the molecular level. CORBA(国际对象管理协作组制定的使OOP对象与网络接口统一起来的一套跨计算机、操作系统、程序语言和网络的共同标准) The Common Object Request Broker Architecture (CORBA) is an open industry standard for working with distributed objects, developed by the Object Management Group. CORBA allows the interconnection of objects and applications regardless of computer language, machine architecture, or geographic location of the computers. Correlation coefficient(相关系数)A numerical measure, falling between - 1 and 1, of the degree of the linear relationship between two variables. A positive value indicates a direct relationship, a negative value indicates an inverse relationship, and the distance of the value away from zero indicates the strength of the relationship. A value near zero indicates no relationship between the variables. Covariation (in sequences)(共变) Coincident change at two or more sequence positions in related sequences that may influence the secondary structures of RNA or protein molecules. Coverage (or depth) (覆盖率/厚度) The average number of times a nucleotide is represented by a high-quality base in a collection of random raw sequence. Operationally, a 'high-quality base' is defined as one with an accuracy of at least 99% (corresponding to a PHRED score of at least 20). Database(数据库) A computerized storehouse of data that provides a standardized way for locating, adding, removing, and changing data. See also Object-oriented database, Relational database. Dendogram A form of a tree that lists the compared objects (e.g., sequences or genes in a microarray analysis) in a vertical order and joins related ones by levels of branches extending to one side of the list. Depth (厚度) See coverage Dirichlet mixtures Defined as the conjugational prior of a multinomial distribution. One use is for predicting the expected pattern of amino acid variation found in the match state of a hid-den Markov model (representing one column of a multiple sequence alignment of proteins), based on prior distributions found in conserved protein domains (blocks). Distance in sequence analysis(序列距离) The number of observed changes in an optimal alignment of two sequences, usually not counting gaps. DNA Sequencing (DNA测序) The experimental process of determining the nucleotide sequence of a region of DNA. This is done by labelling each nucleotide (A, C, G or T) with either a radioactive or fluorescent marker which identifies it. There are several methods of applying this technology, each with their advantages and disadvantages. For more information, refer to a current text book. High throughput laboratories frequently use automated sequencers, which are capable of rapidly reading large numbers of templates. Sometimes, the sequences may be generated more quickly than they can be characterised. Domain (功能域) A discrete portion of a protein assumed to fold independently of the rest of the protein and possessing its own function.Dot matrix(点标矩阵图) Dot matrix diagrams provide a graphical method for comparing two sequences. One sequence is written horizontally across the top of the graph and the other along the left-hand side. Dots are placed within the graph at the intersection of the same letter appearing in both sequences. A series of diagonal lines in the graph indicate regions of alignment. The matrix may be filtered to reveal the most-alike regions by scoring a minimal threshold number of matches within a sequence window. Draft genome sequence (基因组序列草图) The sequence produced by combining the information from the individual sequenced clones (by creating merged sequence contigs and then employing linking information to create scaffolds) and positioning the sequence along the physical map of the chromosomes. DUST (一种低复杂性区段过滤程序) A program for filtering low complexity regions from nucleic acid sequences. Dynamic programming(动态规划法) A dynamic programming algorithm solves a problem by combining solutions to sub-problems that are computed once and saved in a table or matrix. Dynamic programming is typically used when a problem has many possible solutions and an optimal one needs to be found. This algorithm is used for producing sequence alignments, given a scoring system for sequence comparisons. EMBL (欧洲分子生物学实验室,EMBL数据库是主要公共核酸序列数据库之一) European Molecular Biology Laboratories. Maintain the EMBL database, one of the major public sequence databases. EMBnet (欧洲分子生物学网络) European Molecular Biology Network: http://www.embnet.org/ was established in 1988, and provides services including local molecular databases and software for molecular biologists in Europe. There are several large outposts of EMBnet, including EXPASY. Entropy(熵) from information theory, a measure of the unpredictable nature of a set of possible elements. The higher the level of variation within the set, the higher the entropy. Erdos and Renyi law In a toss of a “fair” coin, the number of heads in a row that can be expected is the logarithm of the number of tosses to the base 2. The law may be generalized for more than two possible outcomes by changing the base of the logarithm to the number of out-comes. This law was used to analyze the number of matches and mismatches that can be expected between random sequences as a basis for scoring the statistical significance of a sequence alignment. EST (表达序列标签的缩写) See Expressed Sequence Tag Expect value (E)(E值) E value. The number of different alignents with scores equivalent to or better than S that are expected to occur in a database search by chance. The lower the E value, the more significant the score. In a database similarity search, the probability that an alignment score as good as the one found between a query sequence and a database sequence would be found in as many comparisons between random sequences as was done to find the matching sequence. In other types of sequence analysis, E has a similar meaning. Expectation maximization (sequence analysis) An algorithm for locating similar sequence patterns in a set of sequences. A guessed alignment of the sequences is first used to generate an expected scoring matrix representing the distribution of sequence characters in each column of the alignment, this pattern is matched to each sequence, and the scoring matrix values are then updated to maximize the alignment of the matrix to the sequences. The procedure is repeated until there is no further improvement. Exon (外显子)
共3页: 上一页 [1] 2 [3] 下一页
上一篇:英语词汇分类记忆--动物名称 下一篇:903个药名词汇
|
|
|