看教程不够直观,那就看视频吧! >>点击加载视频
Alignment
Similarity-based arrangement of DNA, RNA or protein sequences. In this context, subject and query sequence should be orthologous and reflect evolutionary, not functional or structural relationships.
Annotation
Computational process of attaching biologically relevant information to genome sequence data.
Assembly
Computational reconstruction of a longer sequence from smaller sequence reads.
Barcode
Short-sequence identifier for individual labelling (barcoding) of sequencing libraries.
BAC
(Bacterial artificial chromosome) DNA construct of various length (150–350kb).
cDNA
Complementary DNA synthesized from an mRNA template
Contig
A contiguous linear stretch of DNA or RNA consensus sequence. Constructed from a number of smaller, partially overlapping, sequence fragments (reads).
Coverage
Also known as ‘sequencing depth’. Sequence coverage refers to the average number of reads per locus and differs from physical coverage, a term often used in genome assembly referring to the cumulative length of reads or read pairs expressed as a multiple of genome size.
De novo assembly
Refers to the reconstruction of contiguous sequences without making use of any reference sequence.
EST library
Expressed sequence tag library. A short subsequence of cDNA transcript sequence.
Fosmid
A vector for bacterial cloning of genomic DNA fragments that usually holds inserts of around 40 kb.
GC content
The proportion of guanine and cytosine bases in a DNA/RNA sequence
Gene ontology
(GO)Structured, controlled vocabularies and classifications of gene function across species and research areas.
InDel
Insertion/deletion polymorphism Insert size Length of randomly sheared fragments (from the genome or transcriptome) sequenced from both ends.
K-mer
Short, unique element of DNA sequence of length k, used by many assembly algorithms.
Library
Collection of DNA (or RNA) fragments modified in a way that is appropriate for downstream analyses, such as high-throughput sequencing in this case.
Mapping
A term routinely used to describe alignment of short sequence reads to a longer reference sequence
Masking
Converting a DNA sequence [A,C,G,T] (usually repetitive or of low quality) to the uninformative character state N or to lower case characters [a,c,g,t] (soft masking).
Massively parallel (or next generation) sequencingHigh-throughput sequencing nano-technology used to determine the base-pair sequence of DNA/RNA molecules at much larger quantities than previous end-termination (e.g. Sanger sequencing) based sequencing techniques.
Mate-pair
Sequence information from two ends of a DNA fragment, usually several thousand base-pairs long.
N50
A statistic of a set of contigs (or scaffolds). It is defined as the length for which the collection of all contigs of that length or longer contains at least half of the total of the lengths of the contigs.
N90
Equivalent to the N50 statistic describing the length for which the collection of all contigs of that length or longer contains at least 90% of the total of the lengths of the contigs.
Optical map
Genomewide, ordered, high-resolution restriction map derived from single, stained DNA molecules. It can be used to improve a genome assembly by matching it to the genomewide pattern of expected restriction sites, as inferred from the genome sequence.
Paired-end sequencing
Sequence information from two ends of a short DNA fragment, usually a few hundred base pairs long.
Read
Short base-pair sequence inferred from the DNA/RNA template bysequencing.
RNA-Seq
High-throughput shotgun transcriptome (cDNA) sequencing. Usually not used synonymous to RNA-sequencing which implies direct sequencing of RNA molecules skipping the cDNA generation step
Scaffold
Two or more contigs joined together using read-pair information
Transcriptome
Set of all RNA molecules transcribed from a DNA template
参考文献
A field guide to whole-genome sequencing, assembly and annotation
如果觉得我的文章对您有用,请随意打赏。你的支持将鼓励我继续创作!