各种平台的表达芯片跟mRNA-seq数据比较

文章见:http://journals.plos.org/plosone ... ournal.pone.0078644指定的细胞系是:Human CCR6+ CD4 memory T cell ,测了6个时间点,共12个样本表达芯片用的是Affymetrix GeneChip HT HG-U13...

文章见:http://journals.plos.org/plosone ... ournal.pone.0078644
指定的细胞系是:Human CCR6+ CD4 memory T cell ,测了6个时间点,共12个样本
表达芯片用的是Affymetrix GeneChip HT HG-U133+ PM arrays
测序用的是: Illumina HiSeq™ 2000 platform,PE,All reads were pair-end sequenced with an average insert size of 160 bp, and typical read-length of 90 bp. 

芯片情况介绍:41,796 of the 54,714 probe sets were mapped to 20,741 genes, with 10,837 genes having more than one representative probe set. 


比较前先把RPKM值和芯片数值归一化:


In summary, RNA-Seq based transcriptome expression was measured as RPKM for 36,004 transcripts, representing 22,300 unique genes. The median RPKM in all 12 samples was 0.49, and 28.6% to 32.5% (average = 30.3%) of genes had RPKM value of 0 in each sample. In order to make the transcriptome profiling comparable between both platforms (RNA-Seq vs. Microarray), the RPKM values were floored at 0.047, followed by log2 transformation. After the transformation, the difference between the median expression and the floored (minimal) expression by RNA-Seq is equal to the difference between the median expression and the minimal expression by microarray.


文章很有趣,值的细看


RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays 
http://genome.cshlp.org/content/18/9/1509.full 

Another paper with a variety of comparisons between Affymetrix Exon arrays, custom NimbleGen arrays, and RNA-seq: Griffith, et al. Alternative expression analysis by RNA sequencing. Nature Methods. 2010 Oct;7(10):843-847.
http://www.nature.com/nmeth/journal/v7/n10/full/nmeth.1503.html 
尤其是这个correlation图,非常重要~~~~
https://www.researchgate.net/fig ... or-RNA-seq-the-LOG2  
第一次看到把图片描述的比文章还长!~~~~~~~、


文章是:https://genomebiology.biomedcent ... 6/s13059-015-0694-1 
这次是临床样本,498个primary neuroblastomas
芯片是:customized 4x44k oligonucleotide microarrays (Agilent Technologies)
测序是:Illumina HiSeq 2000 platform,TruSeq PE cluster Kit v3
数据都可以在NCBI里面拿到;
Microarray and RNA-seq data can be accessed from the GEO database (www.ncbi.nlm.nih.gov/geo/) with accession numbers GSE49710 and GSE49711, respectively, which are included in SEQC Project SuperSeries GSE47792.




  • 发表于 2017-08-15 07:43
  • 阅读 ( 5550 )
  • 分类:测序技术

1 条评论

请先 登录 后评论
不写代码的码农
生信菜鸟团

博士在读

13 篇文章

作家榜 »

  1. 祝让飞 118 文章
  2. 柚子 91 文章
  3. 刘永鑫 64 文章
  4. admin 57 文章
  5. 生信分析流 55 文章
  6. SXR 44 文章
  7. 张海伦 31 文章
  8. 爽儿 25 文章