测序各种名词解释,都在这里

收集的各种测序名词解释,Contig,Assembly,等等都在这里了

Alignment 

Similarity-based arrangement of DNA, RNA or protein sequences. In this context, subject and query sequence should be orthologous and reflect evolutionary, not functional or structural relationships.

Annotation 

Computational process of attaching biologically relevant information to genome sequence data.

Assembly

Computational reconstruction of a longer sequence from smaller sequence reads.

Barcode 

Short-sequence identifier for individual labelling (barcoding) of sequencing libraries.

BAC

(Bacterial artificial chromosome) DNA construct of various length (150–350kb).

cDNA 

Complementary DNA synthesized from an mRNA template

Contig

A contiguous linear stretch of DNA or RNA consensus sequence. Constructed from a number of smaller, partially overlapping, sequence fragments (reads).

Coverage 

Also known as ‘sequencing depth’. Sequence coverage refers to the average number of reads per locus and differs from physical coverage, a term often used in genome assembly referring to the cumulative length of reads or read pairs expressed as a multiple of genome size.

De novo assembly

Refers to the reconstruction of contiguous sequences without making use of any reference sequence.

EST library 

Expressed sequence tag library. A short subsequence of cDNA transcript sequence.

Fosmid

A vector for bacterial cloning of genomic DNA fragments that usually holds inserts of around 40 kb.

GC content 

The proportion of guanine and cytosine bases in a DNA/RNA sequence

Gene ontology

(GO)Structured, controlled vocabularies and classifications of gene function across species and research areas.

InDel 

Insertion/deletion polymorphism Insert size Length of randomly sheared fragments (from the genome or transcriptome) sequenced from both ends.

K-mer 

Short, unique element of DNA sequence of length k, used by many assembly algorithms.

Library 

Collection of DNA (or RNA) fragments modified in a way that is appropriate for downstream analyses, such as high-throughput sequencing in this case.

Mapping 

A term routinely used to describe alignment of short sequence reads to a longer reference sequence

Masking 

Converting a DNA sequence [A,C,G,T] (usually repetitive or of low quality) to the uninformative character state N or to lower case characters [a,c,g,t] (soft masking).

Massively parallel (or next generation) sequencingHigh-throughput sequencing nano-technology used to determine the base-pair sequence of DNA/RNA molecules at much larger quantities than previous end-termination (e.g. Sanger sequencing) based sequencing techniques.

Mate-pair

Sequence information from two ends of a DNA fragment, usually several thousand base-pairs long.

N50 

A statistic of a set of contigs (or scaffolds). It is defined as the length for which the collection of all contigs of that length or longer contains at least half of the total of the lengths of the contigs.

N90 

Equivalent to the N50 statistic describing the length for which the collection of all contigs of that length or longer contains at least 90% of the total of the lengths of the contigs.

Optical map 

Genomewide, ordered, high-resolution restriction map derived from single, stained DNA molecules. It can be used to improve a genome assembly by matching it to the genomewide pattern of expected restriction sites, as inferred from the genome sequence.

Paired-end sequencing

Sequence information from two ends of a short DNA fragment, usually a few hundred base pairs long.

Read 

Short base-pair sequence inferred from the DNA/RNA template bysequencing.

RNA-Seq 

High-throughput shotgun transcriptome (cDNA) sequencing. Usually not used synonymous to RNA-sequencing which implies direct sequencing of RNA molecules skipping the cDNA generation step

Scaffold 

Two or more contigs joined together using read-pair information

Transcriptome 

Set of all RNA molecules transcribed from a DNA template

参考文献

A field guide to whole-genome sequencing, assembly and annotation

  • 发表于 2017-04-05 09:49
  • 阅读 ( 7131 )
  • 分类:基因组学

0 条评论

请先 登录 后评论
不写代码的码农
SXR

44 篇文章

作家榜 »

  1. 祝让飞 118 文章
  2. 柚子 91 文章
  3. 刘永鑫 64 文章
  4. admin 57 文章
  5. 生信分析流 55 文章
  6. SXR 44 文章
  7. 张海伦 31 文章
  8. 爽儿 25 文章