ggalluvial:冲击图展示组间变化、时间序列和复杂多属性alluvial diagram

感谢“宏基因组0”群友李海敏、沈伟推荐此包绘制堆叠柱状图各成分连线:突出展示组间物种丰度变化。 冲击图(alluvial diagram)是流程图(flow diagram)的一种,最初开发用于代表网络结构的时间变...

感谢“宏基因组0”群友李海敏、沈伟推荐此包绘制堆叠柱状图各成分连线:突出展示组间物种丰度变化

冲击图(alluvial diagram)是流程图(flow diagram)的一种,最初开发用于代表网络结构的时间变化。

image

实例1. neuroscience coalesced from other related disciplines to form its own field. From PLoS ONE 5(1): e8694 (2010)

image

实例2. Sciences封面哈扎人肠道菌群 图1中的C/D就使用了3个冲击图。详见3分和30分文章差距在哪里?

ggalluvial是一个基于ggplot2的扩展包,专门用于快速绘制冲击图(alluvial diagram),有些人也叫它桑基图(Sankey diagram),但两者略有区别,将来我们会介绍riverplot包绘制桑基图。

软件源代码位于Github: https://github.com/corybrunson/ggalluvial

CRNA官方演示教程: https://cran.r-project.org/web/packages/ggalluvial/vignettes/ggalluvial.html

安装

以下三种方装方式,三选1:

# 国内用户推荐清华镜像站
site="https://mirrors.tuna.tsinghua.edu.cn/CRAN"
# 安装稳定版(推荐)
install.packages("ggalluvial", repo=site)

# 安装开发版(github不稳定有时间下载失败,多试几次可以成功)
devtools::install_github("corybrunson/ggalluvial", build_vignettes = TRUE)

# 安装新功能最优版
devtools::install_github("corybrunson/ggalluvial", ref = "optimization")

显示帮助文档

使用vignette查看演示教程

# 查看教程
vignette(topic = "ggalluvial", package = "ggalluvial")

接下来我们的演示均基于此官方演示教程,我的主要贡献是翻译与代码注释。

基于ggplot2的冲击图

原作者:Jason Cory Brunson, 更新日期:2018-02-11

1. 最简单的示例

基于泰坦尼克事件人员统计绘制性别与舱位和年龄的关系。

# 加载包
library(ggalluvial)

# 转换内部数据为数据框,宽表格模式
titanic_wide <- data.frame(Titanic)

# 显示数据格式
head(titanic_wide)
#>   Class    Sex   Age Survived Freq
#> 1   1st   Male Child       No    0
#> 2   2nd   Male Child       No    0
#> 3   3rd   Male Child       No   35
#> 4  Crew   Male Child       No    0
#> 5   1st Female Child       No    0
#> 6   2nd Female Child       No    0

# 绘制性别与舱位和年龄的关系
ggplot(data = titanic_wide,
       aes(axis1 = Class, axis2 = Sex, axis3 = Age,
           weight = Freq)) +
  scale_x_discrete(limits = c("Class", "Sex", "Age"), expand = c(.1, .05)) +
  geom_alluvium(aes(fill = Survived)) +
  geom_stratum() + geom_text(stat = "stratum", label.strata = TRUE) +
  theme_minimal() +
  ggtitle("passengers on the maiden voyage of the Titanic",
          "stratified by demographics and survival")

具体参考说明:data设置数据源,axis设置显示的柱,weight为数值,geom_alluvium为冲击图组间面积连接并按生存率比填充分组,geom_stratum()每种有柱状图,geom_text()显示柱状图中标签,theme_minimal()主题样式的一种,ggtitle()设置图标题

image

图1. 展示性别与舱位和年龄的关系及存活率比例

我们发现上图居然画的是宽表格模式下的数据,而通常ggplot2处理都是长表格模式,如何转换呢?

to_loades转换为长表格

# 长表格模式,to_loades多组组合,会生成alluviumstratum列。主分组位于命名的key列中
titanic_long <- to_lodes(data.frame(Titanic),
                         key = "Demographic",
                         axes = 1:3)
head(titanic_long)
ggplot(data = titanic_long,
       aes(x = Demographic, stratum = stratum, alluvium = alluvium,
           weight = Freq, label = stratum)) +
  geom_alluvium(aes(fill = Survived)) +
  geom_stratum() + geom_text(stat = "stratum") +
  theme_minimal() +
  ggtitle("passengers on the maiden voyage of the Titanic",
          "stratified by demographics and survival")

产生和上图一样的图,只是数据源格式不同。

2. 输入数据格式

定义一种Alluvial宽表格

# 显示数据格式
head(as.data.frame(UCBAdmissions), n = 12)
##       Admit Gender Dept Freq
## 1  Admitted   Male    A  512
## 2  Rejected   Male    A  313
## 3  Admitted Female    A   89
## 4  Rejected Female    A   19
## 5  Admitted   Male    B  353
## 6  Rejected   Male    B  207
## 7  Admitted Female    B   17
## 8  Rejected Female    B    8
## 9  Admitted   Male    C  120
## 10 Rejected   Male    C  205
## 11 Admitted Female    C  202
## 12 Rejected Female    C  391

# 判断数据格式
is_alluvial(as.data.frame(UCBAdmissions), logical = FALSE, silent = TRUE)
## [1] "alluvia"

查看性别与专业间关系,并按录取情况分组

ggplot(as.data.frame(UCBAdmissions),
       aes(weight = Freq, axis1 = Gender, axis2 = Dept)) +
  geom_alluvium(aes(fill = Admit), width = 1/12) +
  geom_stratum(width = 1/12, fill = "black", color = "grey") +
  geom_label(stat = "stratum", label.strata = TRUE) +
  scale_x_continuous(breaks = 1:2, labels = c("Gender", "Dept")) +
  scale_fill_brewer(type = "qual", palette = "Set1") +
  ggtitle("UC Berkeley admissions and rejections, by sex and department")

image

3. 三类型间关系,按重点着色

Titanic按生存,性别,舱位分类查看关系,并按舱位填充色

ggplot(as.data.frame(Titanic),
       aes(weight = Freq,
           axis1 = Survived, axis2 = Sex, axis3 = Class)) +
  geom_alluvium(aes(fill = Class),
                width = 0, knot.pos = 0, reverse = FALSE) +
  guides(fill = FALSE) +
  geom_stratum(width = 1/8, reverse = FALSE) +
  geom_text(stat = "stratum", label.strata = TRUE, reverse = FALSE) +
  scale_x_continuous(breaks = 1:3, labels = c("Survived", "Sex", "Class")) +
  coord_flip() +
  ggtitle("Titanic survival by class and sex")

image

4. 长表格数据

# to_lodes转换为长表格
UCB_lodes <- to_lodes(as.data.frame(UCBAdmissions), axes = 1:3)
head(UCB_lodes, n = 12)
##    Freq alluvium     x  stratum
## 1   512        1 Admit Admitted
## 2   313        2 Admit Rejected
## 3    89        3 Admit Admitted
## 4    19        4 Admit Rejected
## 5   353        5 Admit Admitted
## 6   207        6 Admit Rejected
## 7    17        7 Admit Admitted
## 8     8        8 Admit Rejected
## 9   120        9 Admit Admitted
## 10  205       10 Admit Rejected
## 11  202       11 Admit Admitted
## 12  391       12 Admit Rejected

# 判断是否符合格式要求
is_alluvial(UCB_lodes, logical = FALSE, silent = TRUE)
## [1] "alluvia"

主要列说明:

  • x, 主要的分类,即X轴上每个柱
  • stratum, 主要分类中的分组
  • alluvium, 连接图的索引

5. 绘制非等高冲击图

以各国难民数据为例,观察多国难民数量随时间变化

data(Refugees, package = "alluvial")
country_regions <- c(
  Afghanistan = "Middle East",
  Burundi = "Central Africa",
  `Congo DRC` = "Central Africa",
  Iraq = "Middle East",
  Myanmar = "Southeast Asia",
  Palestine = "Middle East",
  Somalia = "Horn of Africa",
  Sudan = "Central Africa",
  Syria = "Middle East",
  Vietnam = "Southeast Asia"
)
Refugees$region <- country_regions[Refugees$country]
ggplot(data = Refugees,
       aes(x = year, weight = refugees, alluvium = country)) +
  geom_alluvium(aes(fill = country, colour = country),
                alpha = .75, decreasing = FALSE) +
  scale_x_continuous(breaks = seq(2003, 2013, 2)) +
  theme(axis.text.x = element_text(angle = -30, hjust = 0)) +
  scale_fill_brewer(type = "qual", palette = "Set3") +
  scale_color_brewer(type = "qual", palette = "Set3") +
  facet_wrap(~ region, scales = "fixed") +
  ggtitle("refugee volume by country and region of origin")

image

6. 等高非等量关系

不同学期学生学习科目的变化

data(majors)
majors$curriculum <- as.factor(majors$curriculum)
ggplot(majors,
       aes(x = semester, stratum = curriculum, alluvium = student,
           fill = curriculum, label = curriculum)) +
  scale_fill_brewer(type = "qual", palette = "Set2") +
  geom_flow(stat = "alluvium", lode.guidance = "rightleft",
            color = "darkgray") +
  geom_stratum() +
  theme(legend.position = "bottom") +
  ggtitle("student curricula across several semesters")

image

7. 工作状态时间变化图

data(vaccinations)
levels(vaccinations$response) <- rev(levels(vaccinations$response))
ggplot(vaccinations,
       aes(x = survey, stratum = response, alluvium = subject,
           weight = freq,
           fill = response, label = response)) +
  geom_flow() +
  geom_stratum(alpha = .5) +
  geom_text(stat = "stratum", size = 3) +
  theme(legend.position = "none") +
  ggtitle("vaccination survey responses at three points in time")

image

8. 分类学门水平相对丰度实战

# 实战1. 组间丰度变化 

# 编写测试数据
df=data.frame(
  Phylum=c("Ruminococcaceae","Bacteroidaceae","Eubacteriaceae","Lachnospiraceae","Porphyromonadaceae"),
  GroupA=c(37.7397,31.34317,222.08827,5.08956,3.7393),
  GroupB=c(113.2191,94.02951,66.26481,15.26868,11.2179),
  GroupC=c(123.2191,94.02951,46.26481,35.26868,1.2179),
  GroupD=c(37.7397,31.34317,222.08827,5.08956,3.7393)
)

# 数据转换长表格

library(reshape2)

melt_df = melt(df)

# 绘制分组对应的分类学,有点像circos
ggplot(data = melt_df,
       aes(axis1 = Phylum, axis2 = variable,
           weight = value)) +
  scale_x_discrete(limits = c("Phylum", "variable"), expand = c(.1, .05)) +
  geom_alluvium(aes(fill = Phylum)) +
  geom_stratum() + geom_text(stat = "stratum", label.strata = TRUE) +
  theme_minimal() +
  ggtitle("Phlyum abundance in each group")

image

绘制分组对应的分类学,有点像circos

# 组间各丰度变化 
ggplot(data = melt_df,
       aes(x = variable, weight = value, alluvium = Phylum)) +
  geom_alluvium(aes(fill = Phylum, colour = Phylum, colour = Phylum),
                alpha = .75, decreasing = FALSE) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = -30, hjust = 0)) +
  ggtitle("Phylum change among groups")

image

组间各丰度变化,如果组为时间效果更好

Reference

# 如何引用
citation("ggalluvial")

Jason Cory Brunson (2017). ggalluvial: Alluvial Diagrams in 'ggplot2'. R package version 0.5.0.https://CRAN.R-project.org/package=ggalluvial

https://en.wikipedia.org/wiki/Alluvial_diagram

ggalluvial包源码:http://corybrunson.github.io/ggalluvial/index.html

官方示例 Alluvial Diagrams in ggplot2 https://cran.r-project.org/web/packages/ggalluvial/vignettes/ggalluvial.html

猜你喜欢

写在后面

为鼓励读者交流、快速解决科研困难,我们建立了“宏基因组”专业讨论群,目前己有国内外1200+ 一线科研人员加入。参与讨论,获得专业解答,欢迎分享此文至朋友圈,并扫码加主编好友带你入群,务必备注“姓名-单位-研究方向-职称/年级”。技术问题寻求帮助,首先阅读《如何优雅的提问》学习解决问题思路,仍末解决群内讨论,问题不私聊,帮助同行。 image

学习扩增子、宏基因组科研思路和分析实战,关注“宏基因组” image

点击阅读原文,跳转最新文章目录阅读https://mp.weixin.qq.com/s/5jQspEvH5_4Xmart22gjMA

  • 发表于 2018-02-17 14:49
  • 阅读 ( 891 )
  • 分类:其他组学

2 条评论

请先 登录 后评论
不写代码的码农
刘永鑫

工程师

64 篇文章

作家榜 »

  1. 祝让飞 83 文章
  2. 刘永鑫 64 文章
  3. SXR 44 文章
  4. 张海伦 31 文章
  5. 爽儿 25 文章
  6. 调研图 24 文章
  7. 生信分析流 20 文章
  8. 生信菜鸟团 13 文章