Transcriptome Analysis of Sarcopyramis nepalensis via RNA-seq Technology-文献传递-植物通论文库

摘要：利用RNA-seq技术对所构建的楮头红叶片的转录组进行测定，对原始reads进行过滤和组装，得到了51 305条质量较高的Unigenes，平均长度为921 nt，N50为1 490 nt。利用BLAST和BLAST2GO 软件对这些从头组装的Unigenes进行注释。用NCBI蛋白质数据库（Nr）、非冗余核苷酸数据库（Nt）、基因本体论（GO）、直系同源基因簇（COG）和京都基因与基因组百科全书（KEGG）数据库做参考，共注释了40 532条Unigenes。注释到Nr、Nt、Swiss-Prot、KEGG、COG和GO库中的比例相对较高，分别为77.53%、56.18%、53.14%、46.58%、29.69%和60.72%。在蛋白质数据库中对所有的Unigenes进行blast以后，发现有39 302个CDS，用ESTscan预测了2 065个CDS。KEGG通路分析显示，参与次生代谢物生物合成的Unigenes有2 323条，占全部Unigenes的9.72%。其中有78条Unigenes编码了细胞色素P450家族蛋白，这些信息为药用植物次生代谢物生物合成关键基因的挖掘提供了理论参考。

Abstract:Transcriptome analysis of Sarcopyramis nepalensis leaves was performed via a newly developed high-throughput sequencing technology（Illumina RNA-seq）. A total of 51 305 unigenes were generated with 921 nt of average length and 1 490 nt of unigene N50 after filtering and assembly of original reads. These unigenes from the de novo assembly were further annotated using BLAST and BLAST2GO softwares. A total of 40 532 unigenes annotated with databases of non-redundant protein sequence（Nr）, non-redundant nucleotide（Nt）, Swiss-Prot, Gene Ontology database（GO）, Clusters of Orthologous Groups（COG）and Kyoto Encyclopedia of Genes and Genomes（KEGG）databases available at NCBI as references. The proportion of unigenes annotated in Nr, Nt, Swiss-Prot, KEGG, COG and GO databases were 77.53%, 56.18%, 53.14%, 46.58%, 29.69% and 60.72%, respectively. Total 39 302 CDSs were obtained using blast in protein databases, and 2 065 CDSs were predicted using ESTscan software. KEGG pathway parsing revealed that 2 323（9.72%）unigenes were involved in biosynthesis of secondary metabolites（KO01110）, and 78 unigenes encoding the cytochrome P450 family proteins were identified. These annotated information provided theoretical foundationfordetermining the vital genes involved in biosynthesis of secondary metabolites of medicinal plants.

全文：·技术与方法·
生物技术通报
BIOTECHNOLOGY BULLETIN 2015, 31(5):84-92
楮头红（Sarcopyramis nepalensis Wall.），民间又
称风柜斗草，为野牡丹科肉穗草属植物，无亚种名。
主要分布于我国台湾、福建、江西、云南和四川等地。
该植物全草可入药，其提取物中主要成分为黄酮类、
收稿日期：2014-09-11
基金项目：深圳市城市管理局项目（201318）
作者简介：金红，博士，高级工程师，研究方向：植物多样性保护及利用；E-mail ：jinhong@szum.gov.cn
基于 RNA-seq 技术的楮头红转录组分析
金红1 焦根林1 陈刚2
（1. 深圳市中国科学院仙湖植物园，深圳 518004 ；2. 肇庆学院生命科学学院，肇庆 526061）
摘要：利用 RNA-seq 技术对所构建的楮头红叶片的转录组进行测定，对原始 reads 进行过滤和组装，得到了 51 305 条质
量较高的 Unigenes，平均长度为 921 nt，N50 为 1 490 nt。利用 BLAST 和 BLAST2GO 软件对这些从头组装的 Unigenes 进行注释。
用 NCBI 蛋白质数据库（Nr）、非冗余核苷酸数据库（Nt）、基因本体论（GO）、直系同源基因簇（COG）和京都基因与基因组百科
全书（KEGG）数据库做参考，共注释了 40 532 条 Unigenes。注释到 Nr、Nt、Swiss-Prot、KEGG、COG 和 GO 库中的比例相对较
高，分别为 77.53%、56.18%、53.14%、46.58%、29.69% 和 60.72%。在蛋白质数据库中对所有的 Unigenes 进行 blast 以后，发现有
39 302 个 CDS，用 ESTscan 预测了 2 065 个 CDS。KEGG 通路分析显示，参与次生代谢物生物合成的 Unigenes 有 2 323 条，占全部
Unigenes 的 9.72%。其中有 78 条 Unigenes 编码了细胞色素 P450 家族蛋白，这些信息为药用植物次生代谢物生物合成关键基因的
挖掘提供了理论参考。
关键词：楮头红转录组；RNA-seq
DOI ：10.13560/j.cnki.biotech.bull.1985.2015.05.014
Transcriptome Analysis of Sarcopyramis nepalensis via RNA-seq
Technology
Jin Hong1 Jiao Genlin1 Chen Gang2
（1. Fairy Lake Botanical Garden，Shenzhen and CAS，Shenzhen 518004 ；2. College of Life Science，Zhaoqing University，
Zhaoqing 526061）
Abstract: Transcriptome analysis of Sarcopyramis nepalensis leaves was performed via a newly developed high-throughput sequencing
technology（Illumina RNA-seq）. A total of 51 305 unigenes were generated with 921 nt of average length and 1 490 nt of unigene N50 after
filtering and assembly of original reads. These unigenes from the de novo assembly were further annotated using BLAST and BLAST2GO
softwares. A total of 40 532 unigenes annotated with databases of non-redundant protein sequence（Nr）, non-redundant nucleotide（Nt）,
Swiss-Prot, Gene Ontology database（GO）, Clusters of Orthologous Groups（COG）and Kyoto Encyclopedia of Genes and Genomes（KEGG）
databases available at NCBI as references. The proportion of unigenes annotated in Nr, Nt, Swiss-Prot, KEGG, COG and GO databases were
77.53%, 56.18%, 53.14%, 46.58%, 29.69% and 60.72%, respectively. Total 39 302 CDSs were obtained using blast in protein databases,
and 2 065 CDSs were predicted using ESTscan software. KEGG pathway parsing revealed that 2 323（9.72%）unigenes were involved in
biosynthesis of secondary metabolites（KO01110）, and 78 unigenes encoding the cytochrome P450 family proteins were identified. These
annotated information provided theoretical foundation for determining the vital genes involved in biosynthesis of secondary metabolites of
medicinal plants.
Key words: Sarcopyramis nepalensis transcriptome ；RNA-seq
2015,31(5) 85金红等：基于 RNA-seq 技术的楮头红转录组分析
多酚、脂肪酸和酚酸类［1，2］，常被用于治疗急、慢
性肝炎、肺热咳嗽、风湿骨痛、无名肿毒等疾病［3］，
是较有价值的珍稀名贵中草药，在我国台湾和福建
地区广为使用。然而，目前对楮头红的研究主要停
留在药理活性及化学成分等方面［4-6］，缺乏对其分
子生物学方面的研究，其基因信息也极少，给楮头
红次生代谢途径的研究带来了极大困难。
转录组是在特定的发育阶段和不同的生理条件
下，所有转录出来的 RNA 的集合。转录组学研究是
解读基因组功能元件、揭示细胞和组织中的分子组
成所必需的［7］。目前由于蛋白质组学技术的限制，
转录组学成为研究基因表达调控的主要方法之一，
是连接基因组学和蛋白质组学的桥梁。通过高通量
的转录组分析，可以获得机体在生命过程中基因的
表达模式。RNA-seq 是 2008 年建立起来的基于深度
测序的转录组分析新技术，它能够在单核苷酸水平
上对任何物种进行整体转录活动的检测［8］。与依赖
基因组背景信息的基因芯片技术相比，该方法更适
合基因组图谱尚未完成或遗传背景信息较为匮乏的
物种［9，10］，是目前在全基因组水平上研究基因表达
模式的主导技术。本研究拟采用 RNA-seq 技术对所
构建的楮头红叶片的转录组进行测定，并对此全基
因组水平转录组进行全局分析，为重要性状相关基
因的克隆及功能分析、鉴定次级代谢物生物合成相
关基因奠定基础，对于未来通过基因工程调控有用
成分的含量以及相关药物的生产具有重要意义。
1 材料与方法
1.1 材料
试验材料为楮头红的新鲜叶片，叶片全展开，
宽约 1.5 cm，于 2013 年 11 月采自湖北省利川市毛
坝新华村，采集后液氮速冻，并转移至 -80℃保存
备用。
1.2 方法
1.2.1 叶片总 RNA 的提取和检测 6 片新鲜的叶
片混合用于 RNA 的提取，采用 Trizol（Invitrogen，
USA）法提取叶片总 RNA，提取过程中所用器具和
耗材都经过 RNase free 处理，提取步骤参照说明书
进行。琼脂糖凝胶电泳检测总 RNA 完整性，微量紫
外分光光度计（NaNoDrop1000）检测 RNA 的纯度和
浓度。
1.2.2 楮头红叶片转录组测序　转录组测序由深圳
华大基因研究院提供技术服务，测序平台为 Illumina
HiSeqTM2000。
1.2.3 测序数据的组装和注释　测序后得到的原始
reads，去除含有接头、重复的和测序质量很低的
reads 后，得到 clean reads。使用短 reads 组装软件
Trinity 对 clean reads 进行从头组装［11］。过滤和组装
以后得到的高质量的 Unigenes，对这些从头组装的
Unigenes 进行注释。用 NCBI 蛋白质数据库（Nr）、
非冗余核苷酸数据库（Nt）、基因本体论（GO）、直
系同源基因簇（COG）和京都基因与基因组百科全
书（KEGG）数据库作参考，得到 Unigene 的功能
注释信息。根据 Nr 注释信息，使用 Blast2GO［12］和
WEGO［13］软件对 Unigene 进行 GO 注释和功能分类。
2 结果
2.1 转录组测序和组装
通过 Illumina Hiseq2000 平台测序，总计产出
58 187 602 条 reads，去除低质量的和含有接头的
reads 以后，得到 54 795 874 条 clean reads，共计
4 931 628 660 个核苷酸（Nucleotides，nt）。G+C 含量、
Q20 和未知碱基序列分别为 52.59%、98.33% 和 0.00%
（表 1）。
表 1 Illumina 测序的产量统计
Raw reads 总数量 Clean reads 总数量 Clean reads 碱基数 /nt Q20/% N 值 /% G+C 含量 /%
58 187 602 54 795 874 4 931 628 660 98.33 0.00 52.59
利用 Trinity 软件对这些 reads 进行组装得到平
均长度为 347 nt（N50=648 nt）的 Contigs 120 934
条，共计 41 909 136 nt。对 Contigs 进行去冗余以后
最终得到 51 305 条 Unigenes，平均长度为 921 nt，
N50 为 1 490 nt。Unigenes 的长度分布显示（图 1），
长度大于 1 000 nt 的 Unigenes 有 16 999 条，占全部
Unigenes 的 33.13%。说明本研究中转录组文库的
测序和组装结果都较好，能够进行后续生物信息学
生物技术通报 Biotechnology Bulletin 2015,Vol.31,No.586
E-value 均小于 1e-5，其中 E-value 小于 1e-100 的有
32.8%，说明比对结果的可信度较高。对 Unigene 的
注释结果进行物种相似性分析显示（图 2-B），序列
比对相似性在 15%-100% 之间，其中大部分序列相
似性在 40%-80% 之间，序列相似度大于 80% 的有
21.4%，说明楮头红转录组的功能注释结果较好。
注释基因的同源序列的物种分布情况见图 2-C，
注释到葡萄（Vitis vinifera）的序列有 21.6% ；其次
是蓖麻（Ricinus communis），有 15.0%。这是因为葡
20
0-
30
0
<2
00
0
13
30
6
62
42
37
95
28
22
24
02
20
42
19
15
17
82
16
34
15
26
14
36
13
12
11
73
11
35
99
2
88
8
88
5
74
2
62
8
51
9
44
5
42
9
39
8
34
5
33
1
26
5
21
6
18
7
15
13
30
0-
40
0
Sequence size/nt
1
10
100
1000
N
um
be
rs
o
f U
ni
ge
ne
s
10000
100000
40
0-
50
0
50
0-
60
0
60
0-
70
0
70
0-
80
0
80
0-
90
0
90
0-
10
00
10
00
-1
10
0
11
00
-1
20
0
12
00
-1
30
0
13
00
-1
40
0
14
00
-1
50
0
15
00
-1
60
0
16
00
-1
70
0
17
00
-1
80
0
18
00
-1
90
0
19
00
-2
00
0
20
00
-2
10
0
21
00
-2
20
0
22
00
-2
30
0
23
00
-2
40
0
24
00
-2
50
0
25
00
-2
60
0
26
00
-2
70
0
27
00
-2
80
0
28
00
-2
90
0
29
00
-3
00
0
>=
30
00
图 1 Unigenes 的长度分布图
表 2 楮头红叶片转录组中 Unigenes 的注释统计
注释类别注释基因数
占 All-Unigenes 的
百分数 /%
All Unigenes 51 305 100
All annotated Unigenes 40 532 79.0
Annotated using Nr database 39 781 77.53
Annotated using Nt database 28 825 56.18
Annotated using Swiss-Prot database 27 263 53.14
Annotated using KEGG database 23 898 46.58
Annotated using COG database 15 234 29.69
Annotated using GO database 31 153 60.72
分析。
2.2 Unigenes的功能注释
通过 blastx 将 Unigene 序列比对到 NCBI 上的蛋
白数据库 Nr、Swiss-Prot 以及 KEGG（http ：//www.
genome.jp/kegg/pathway.html）和 COG（http ：//www.
ncbi.nlm.nih.gov/COG）（E-value<1e-5），并通过 blastn
将 Unigene 比对到核酸数据 Nt（E-value<1e-5），得
到与给定 Unigene 具有最高序列相似性的蛋白，从
而得到该 Unigene 的蛋白功能注释信息。
注释结果显示共有 40 532（79.0%）的 Unigenes
是有注释的（表 2），其中匹配到 Nr 数据库中的有
39 781 条，占全部 Unigenes 的 77.53%。此外，有
21% 的 Unigenes 无法注释，这可能与楮头红的遗传
背景信息较少有关。注释到 Nr 数据库中 Unigenes
的 E-value 分布显示（图 2-A），比对到的物种序列
17.5%
8.3%
15.3%
29.4%
7.9%
2.9%
18.5%
41.3%
12.0%
7.6% 7.2% 6.0%
16.2%
21.6%
15.0%
14.4%
Vitis vinifera
Ricinus communis
Amygdalus persica
Populus balsamifera sub sp. trichocarpa
Glycine max
Fragaria vesca subsp. vesca
Cucumis sativus
other
Species DistributionC
B
A E-value Distribution
Similarity Distribution
0
0~1e-100
1e-100~1e-60
1e-60~1e-45
1e-45~1e-30
1e-30~1e-15
1e-15~1e-5
17%40%
40%60%
60%80% 80%95%95%100%
16.6%
12.3%
15.6%
14.5%
仌㢢ḷ䇶㿱⭥ᆀ⡸
图 2 Unigenes 在 Nr 库中的 E-value 分布（A）、
相似性分布（B）及物种分布（C）
2015,31(5) 87金红等：基于 RNA-seq 技术的楮头红转录组分析
萄和蓖麻具有丰富的基因组信息，为本研究中转录
组的注释提供了参考序列［14，15］。
Gene Ontology（简称 GO）是一个国际标准化
的基因功能分类体系，提供了一套动态更新的标准
词汇表（controlled vocabulary）来全面描述生物体
中基因和基因产物的属性。据 NCBI 数据库注释信
息，使用 Blast2GO 软件得到 Unigene 的 GO 注释信
息，然后用 WEGO［13］对所有 Unigene 做 GO 功能分
类统计，从宏观上认识楮头红的基因功能分布特征。
对楮头红转录组 Unigenes 进行 GO 分析发现，有
31 153 条 Unigenes 注释到 GO 数据库，注释比例为
60.72%。注释到生物学过程的基因最多，为 129 573
个，其次是细胞组成（96 967 个），分子功能的最少，
只有 34 741 个。GO 分析的这 3 个 ontology 又分为
56 个亚类，如在生物过程中，细胞过程和代谢过程
所占比例较高，细胞和细胞器部分在细胞组成所占
比例较高，连接和催化活性在分子功能中占有较高
比例（图 3）。
COG 注释表明，本研究中得到的注释到 COG
中的 15 234 条 Unigenes 分布于 25 个基因家族（图
4），如 RNA 加工与修饰、染色体结构和动力学、能
量产生与运输、细胞周期控制、细胞分裂及染色体
分裂等。在 25 类基因家族中，注释最多的是一般功
能预测（R），其次是转录（K）。值得注意的是，有
4 564 条 Unigenes 被注释到次生代谢物的生物合成、
运输及分解代谢，为后续研究楮头红中有用化学成
分相关的基因奠定了良好的基础。
2.3 Unigenes的代谢通路分析
对楮头红叶片的转录组进行 KEGG 分析发现，
有 23 898 条 Unigenes 注释到 KEGG 数据库中，分
布于 128 条已知的通路中，包括类黄酮生物合成
（169 条，ko00941）、类胡萝卜素生物合成（160 条，
ko00906）、环烯醚萜生物合成（144 条，ko00900）。
注释序列数目较多的 3 个通路分别是代谢途径、次
生代谢物生物合成和植物激素信号转导（表 3）。此
外，从 KEGG 分析中，鉴定出了 78 个编码细胞色素
P450 的 Unigenes。这些注释为后续次生代谢物的合
成和代谢提供了很多有价值的信息。
2.4 预测编码蛋白框
按照 Nr、Swiss-Prot、KEGG 和 COG 的优先级
顺序对楮头红叶片转录组中 51 305 条 Unigenes 进
行 blastx 比对（E-value<1e-5），以确定该 Unigene
的编码区序列。然后将编码区序列翻译成氨基酸序
列，得到该 Unigene 编码区的氨基酸序列和核酸序
列（序列方向 5→3）。最后，用 ESTScan 软件预
测无法比对到以上蛋白质库的 Unigenes 编码区，得
到其编码区的氨基酸序列和核酸序列（序列方向
5→3）。将所有的 Unigenes 在蛋白质数据库中进行
blast 比对以后，发现有 39 302 个 Coding Sequence
（CDS），其中有 74.32%（29 211 个）的序列长度大
于 1 000 nt（图 5）。在 ESTscan 预测了 2 065 个 CDS
（图 6），序列长度范围分布在 200-2 000 nt 之间，其
中片段长度为 500-1 000 nt 的有 233 条，比例约为
11.28%。说明 Unigenes 的序列质量较好，共预测了
41 367 个 CDS。
3 讨论
近年来，新一代高通量测序技术的建立和发展，
为非模式植物功能基因的研究提供了有效手段［16］，
也为植物基因组的研究提供了便利。然而，与模式
植物和重要的农作物相比，目前对药用植物基因组
的研究相对较少［17］，RNA-seq 技术为植物转录组研
究提供了新的发展契机［18］。通过药用植物的研究，
可以清楚地了解其生物学和生物医学功能的基本信
息［19，20］。本研究首次利用 RNA-seq 高通量测序技
术对楮头红叶片的转录组进行测序和转录组结果分
析，为其次生代谢物合成相关基因的挖掘、研究基
因功能提供了基础信息，同时为其有效成分的生物
合成途径和调控机制的研究奠定了基础。
本研究中共得到 51 305 条质量较高的 Unigenes，
平均长度为 921 nt，注释比例高达 79.0%，其 Unig-
enes 数目、平均长度和注释比例都远高于基于 Roc-
he 454 的丹参（Salvia miltiorrhiza）转录组（46 722
条 Unigenes，平均长度为 414 nt，注释比例约为
28.48%）［21］，美国西洋参（American ginseng）的
转录组（31 088 条序列，平均长度为 427 nt，注释
比例为 69.8%）［22］。在丹参和西洋参的研究中，测
序平台为 Roche 454，而在本研究中，测序平台为
Illumina。Roche 454 平均读长较长，但是准确率
低，Illumina 性价比较高，覆盖度较深，经过技术
升级后，平均读长也得到了改善，是目前使用较为
生物技术通报 Biotechnology Bulletin 2015,Vol.31,No.588
0
transporter activity
transporter regulator activity
transporter molecule activity
receptor activity
protein tag
protein binding transcription factor activity
nutrient reservoir activity
nucleic acid binding transcription factor activity
molecular transducer activity
metallochaperone activity
enzyme regulator activity
electron carrier activity
channel regulator activity
catalytic activity
binding
antioxidant activity
virion part
virion
symplast
organelle part
organelle
nucleoid
membrane-enclosed lumen
membrane part
membrane
macromolecular complex
extracellular region part
extracellular region
extracellular matrix part
extracellular matrix
cell part
cell junction
cell
single-organism process
signaling
rhythmic process
response to stimulus
reproductive process
reproduction
regulation of biological process
positive regulation of biological process
negative regulation of biological process
multicellular organismal process
multi-organism process
metabolic process
locomotion
locallzation
immune system process
growth
establishment of localization
deveiopmental process
cellular process
cell killing
cellular component organization or biogenesis
biological adhesion
biological regulation
311 3115
Number of Unigenes
31153
m
ol
ec
ul
ar
_f
un
ct
io
n
ce
llu
la
r_
co
m
po
ne
nt
bi
ol
og
ic
al
_p
ro
ce
ss
0.1 1.0 10 100
ne
pa
le
ns
is
-U
ni
ge
ne
G
O
C
la
ss
ifi
ca
tio
n
Percent of Unigenes
图 3 Unigene 在 GO 库中的分布
2015,31(5) 89金红等：基于 RNA-seq 技术的楮头红转录组分析
A: RNA processing and modification
B: Chromatin structure and dynamics
C: Energy production and conversion
D: Cell cycle control, cell division, chromosome partitioning
E: Amino acid transport and metabolism
F: Nucleotide transport and metabolism
G: Carbohydrate transport and metabolism
H: Coenzyme transport and metabolism
I: Lipid transport and metabolism
J: Translation, ribosomal structure and biogenesis
K: Transcription
L: Replication, recombination and repair
M: Cell wall/membrane/envelope biogenesis
N: Cell motility
O: Posttranslational modification, protein turnover, chaperones
P: Inorganic ion transport and metabolism
Q: Secondary metabolites biosynthesis, transport and catabolism
R: General function prediction only
S: Function unknown
T: Signal transduction mechanisms
U: Intracellular trafficking, secretion, and vesicular transport
V: Defense mechanisms
W: Extracellular structures
Y: Nuclear structure
Z: Cytoskeleton
1000
2000
3000
N
um
be
r o
f U
ni
ge
ne
s
4000
5000
0
A B C D E F G H
Function Class
I J K L M N O P Q R S T U V W Y Z
图 4 Unigene 在 COG 库中的分布
表 3 注释序列最多的前 20 条通路
编号途径注释基因数 /23 898 占总基因数的比例 /% 代谢通路 ID
1 新陈代谢通路 5810 24.31 ko01100
2 次生代谢物生物合成 2323 9.72 ko01110
3 植物激素信号转导 1532 6.41 ko04075
4 内吞作用 1461 6.11 ko04144
5 植物 - 病原菌感染 1453 6.08 ko04626
6 甘油磷脂代谢 1350 5.65 ko00564
7 醚脂类代谢 1193 4.99 ko00565
8 RNA 运输 1084 4.54 ko03013
9 剪接体 844 3.53 ko03040
10 淀粉和蔗糖代谢 800 3.35 ko00500
11 内质网蛋白加工 740 3.1 ko04141
12 mRNA 监测途径 681 2.85 ko03015
13 核糖体 650 2.72 ko03010
14 泛素介导的蛋白水解作用 603 2.52 ko04120
15 戊糖和葡萄糖醛酸酯互变 511 2.14 ko00040
16 嘌呤代谢 501 2.1 ko00230
17 嘧啶代谢 434 1.82 ko00240
18 氧化磷酸化 383 1.6 ko00190
19 RNA 降解 354 1.48 ko03018
20 真核生物核糖体合成 334 1.4 ko03008
生物技术通报 Biotechnology Bulletin 2015,Vol.31,No.590
广泛的测序平台之一［23］。Unigenes 的片段长度对注
释效率也有显著的影响，本研究中 Unigenes 注释到
Nr、Nt、Swiss-Prot、KEGG、COG 和 GO 库中的比
例也相对较高，分别为 77.53%、56.18%、53.14%、
46.58%、29.69% 和 60.72%。在蛋白质数据库中对
所有的 Unigenes 进行 blast 发现，有 39 302 个 CDS，
用 ESTscan 预测了 2 065 个 CDS。通过与以上两种
药用植物的测序结果相比，进一步说明了本研究转
录组测序的产量更高、质量更好。
细胞色素 P450（Cytochrome P450，CYP450）是
广泛存在于几乎所有生物体内的代谢酶，参与了内
源性生理物质如类固醇、细胞激素和脂肪酸等物质
的代谢［24］，是植物中所占比例最大的一类蛋白家族，
负责大多数次生代谢物的氧化过程［25，26］。本研究中
KEGG 通路分析显示，参与次生代谢物生物合成的
Unigenes 有 2 323 条，占全部 Unigenes 的 9.72%，其
中有 78 条 Unigenes 编码了细胞色素 P450 家族蛋白。
除此之外，还有参与二萜类生物合成、异黄酮生物
合成、生物碱合成等次生代谢物合成的 Unigenes。
研究表明，楮头红中主要的活性物质为黄酮
类化合物，黄酮是植物体内具有重要作用的次生代
谢物，具有抗氧化、清除自由基、抗肿瘤等广泛的
药理活性［27］。查尔酮异构酶（Chalconeisomerase，
CHI）是黄酮类化合物代谢途径中的一个关键酶，能
够催化分子内环化反应，从而使双环的查尔酮转化
为有生物活性的三环（2S）- 黄烷酮［28］。目前已在
灯盏菊［29］、水母雪兔子［30］、金荞麦［31］等药用植
物中分离出了 CHI 基因。本研究中有 4 条 Unigenes
编码了 CHI，这对于调控楮头红中黄酮类化合物的
合成提供了可靠的依据。
植物生物碱是一种广泛存在于高等和低等植物
中的含氮有机化合物，是许多中草药和药用植物的
有效成分［32］。据报道，大约有 20% 的有花植物都
能产生生物碱［33］。由于目前对楮头红的有效成分研
究较少，尚未发现楮头红中有生物碱成分的报道。
然而，基于转录组测序的优势，我们发现在 KEGG
通路中有 7 条（KO00901、KO00950、KO00960、
KO00900、KO00904、KO00909、KO00902）是与
生物碱合成相关的代谢途径，包括萜类合成途径、
吲哚合成途径、异喹啉合成途径等，共有 362 条
Unigenes，这从侧面反映了楮头红中含有不同类型
的生物碱，为其药用成分的开发和应用领域的拓展
奠定了基础。
4 结论
利用 Illumina 测序技术对楮头红叶片的转录组
进行测序和分析，揭示了其转录组的整体表达模式，
初步获得了一些参与次生代谢物合成的基因序列信
息，共获得 51 305 条质量较高的 Unigenes，其中有
注释的 40 532 条，注释比例高达 79%。初步获得了
一些与次生代谢物合成相关的基因序列，为深入开
20
0-
30
0
<2
00
40
48 74
34
44
52
33
87
25
92
23
09
18
63
15
44
15
82
14
48
11
53
11
15
90
3
82
0
72
6
59
9
46
0
41
8
36
6
29
9
26
7
21
9
16
1
16
2
10
4
92 921
03 11
5
46
9
30
0-
40
0
Sequence size/nt
1
10
100
1000
N
um
be
rs
o
f U
ni
ge
ne
s.b
la
st
.c
ds
10000
100000
40
0-
50
0
50
0-
60
0
60
0-
70
0
70
0-
80
0
80
0-
90
0
90
0-
10
00
10
00
-1
10
0
11
00
-1
20
0
12
00
-1
30
0
13
00
-1
40
0
14
00
-1
50
0
15
00
-1
60
0
16
00
-1
70
0
17
00
-1
80
0
18
00
-1
90
0
19
00
-2
00
0
20
00
-2
10
0
21
00
-2
20
0
22
00
-2
30
0
23
00
-2
40
0
24
00
-2
50
0
25
00
-2
60
0
26
00
-2
70
0
27
00
-2
80
0
28
00
-2
90
0
29
00
-3
00
0
>=
30
00
20
0-
30
0
<2
00
11
7
38
1
12
1
45
17 1
8
12
3 34
5
1 1 10 0 0 0 0 0 0 0 0 0 0 0 01 1
13
34
30
0-
40
0
Sequence size/nt
1
10
100
1000
N
um
be
rs
o
f U
ni
ge
ne
s.E
ST
sc
an
.c
ds 10000
100000
40
0-
50
0
50
0-
60
0
60
0-
70
0
70
0-
80
0
80
0-
90
0
90
0-
10
00
10
00
-1
10
0
11
00
-1
20
0
12
00
-1
30
0
13
00
-1
40
0
14
00
-1
50
0
15
00
-1
60
0
16
00
-1
70
0
17
00
-1
80
0
18
00
-1
90
0
19
00
-2
00
0
20
00
-2
10
0
21
00
-2
20
0
22
00
-2
30
0
23
00
-2
40
0
24
00
-2
50
0
25
00
-2
60
0
26
00
-2
70
0
27
00
-2
80
0
28
00
-2
90
0
29
00
-3
00
0
>=
30
00
图 5 Blast 预测的 Unigenes 的 CDS 长度分布图
图 6 ESTscan 预测的 Unigenes 的 CDS 长度分布图
2015,31(5) 91金红等：基于 RNA-seq 技术的楮头红转录组分析
展药用植物生物活性成分的合成和鉴定以及功能基
因的筛选提供了丰富了数据资料。在利用生物技术
提高药用植物有效成分含量以及相关药物研发和生
产方面，具有重要的应用价值。
参考文献
［1］ Wang XM, Wan CP, Zhou SR, Qiu Y. Two new flavonol glycosides
from Sarcopyramis bodinieri var. delicate［J］. Molecules, 2008, 13
（6）：1399-1405.
［2］ Wan C, Zheng X, Chen H, et al. Flavonoid constituents from herbs of
Sarcopyramis bodinieri var. delicata［J］. China Journal of Chinese
Materia Medica, 2009, 34（2）：172-174.
［3］陈桂菲 , 叶淑玲 . 风柜斗草药用探讨［J］. 时珍国医国药 ,
2000, 11（2）：129.
［4］ Wei WB, Huang YJ, Cong HJ, et al. Chemical constituents from
Sarcopyramis nepalensis Wall［J］. Phytochemistry Letters, 2014, 8：
101-104.
［5］ Zhang JW, Liao M, Chen HD, Zhang YH. Structural elucidation of
a new flavone from Sarcopyramis nepalensis［J］. Journal of Asian
Natural Products Research, 2011, 13（3）：256-259.
［6］陈豪 , 何丽君 , 林丽芳 . 民间草药风柜斗草研究进展［J］. 海
峡药学 , 2012, 24（10）：55-56.
［7］ Wang Z, Gerstein M, Snyder M. RNA-Seq ：a revolutionary tool for
transcriptomics［J］. Nature Reviews Genetics, 2009, 10（1）：
57-63.
［8］ Costa V, Angelini C, De Feis I, Ciccodicola A. Uncovering the comp-
lexity of transcriptomes with RNA-Seq［J］. Journal of Biomedicine
and Biotechnology, 2010, 2010, doi ：10.1155/2010/1853916.
［9］ Birzele F, Schaub J, Rust W, et al. Into the unknown ：expression
profiling without genome sequence information in CHO by next
generation sequencing［J］. Nucleic Acids Research, 2010, 38（12）：
3999-4010.
［10］ Wang ET, Sandberg R, Luo S, et al. Alternative isoform regulation
in human tissue transcriptomes［J］. Nature, 2008, 456（7221）：
470-476.
［11］ Grabherr MG, Haas BJ, Yassour M, et al. Full-length transcriptome
assembly from RNA-Seq data without a reference genome［J］.
Nature Biotechnology, 2011, 29（7）：644-652.
［12］ Conesa A, Götz S, García-Gómez JM, et al. Blast2GO ：a universal
tool for annotation, visualization and analysis in functional genomics
research［J］. Bioinformatics, 2005, 21（18）：3674-3676.
［13］ Ye J, Fang L, Zheng H, et al. WEGO ：a web tool for plotting GO
annotations［J］. Nucleic Acids Research, 2006, 34（suppl. 2）：
W293-W297.
［14］ Chan AP, Crabtree J, Zhao Q, et al. Draft genome sequence of the
oilseed species Ricinus communis［J］. Nature Biotechnology,
2010, 28（9）：951-956.
［15］ Jaillon O, Aury JM, Noel B, et al. The grapevine genome sequence
suggests ancestral hexaploidization in major angiosperm phyla［J］.
Nature, 2007, 449（7161）：463-467.
［16］刘玉林 , 李伟 , 张志翔 . 基于高通量测序的辽东栎转录组学研
究［J］. 生物技术通报 , 2014（7）：119-124.
［17］ Chen S, Xiang L, Guo X, Li Q. An introduction to the medicinal
plant genome project［J］. Frontiers of Medicine, 2011, 5（2）：
178-184.
［18］付畅 , 黄宇 . 转录组学平台技术及其在植物抗逆分子生物学
中的应用［J］. 生物技术通报 , 2011（6）：40-46.
［19］ Collins FS, Green ED, Guttmacher AE, Guyer MS. A vision for the
future of genomics research. pdf［J］. Nature, 2003, 422（6934）：
835-847.
［20］ Lockhart DJ, Winzeler EA. Genomics, gene expression and DNA
arrays［J］. Nature, 2000, 405（6788）：827-836.
［21］李滢 , 孙超 , 罗红梅 , 等 . 基于高通量测序 454 GS FLX 的丹参
转录组学研究［J］. 药学学报 , 2010, 45（4）：524-529.
［22］ Sun C, Li Y, Wu Q, et al. De novo sequencing and analysis of the
American ginseng root transcriptome using a GS FLX Titanium
platform to discover putative genes involved in ginsenoside biosyn-
thesis［J］. BMC Genomics, 2010, 11 ：262.
［23］ Zhang J, Chiodini R, Badr A, Zhang G. The impact of next-
generation sequencing on genomics［J］. Journal of Genetics and
Genomics, 2011, 38（3）：95-109.
［24］ Jørgensen A, Giessing A, Rasmussen LJ, Andersen O. Biotransfor-
mation of polycyclic aromatic hydrocarbons in marine polychaetes
［J］. Marine Environmental Research, 2008, 65（2）：171-186.
［25］ Coon MJ. Cytochrome P450 ：nature’s most versatile biological
catalyst［J］. Annual Review of Pharmacology and Toxicology,
2005, 45 ：1-25.
［26］ Morant M, Bak S, Mø ller BL, Werck-Reichhart D. Plant
cytochromes P450 ：tools for pharmacology, plant protection and
phytoremediation［J］. Current Opinion in Biotechnology, 2003,
生物技术通报 Biotechnology Bulletin 2015,Vol.31,No.592
14（2）：151-162.
［27］程秋月 , 郭菁 , 张成义 . 黄酮类化合物药理作用的研究［J］.
北华大学学报：自然科学版 , 2012, 12（2）：180-183.
［28］李琳玲 , 程华 , 许锋 , 程水源 . 植物查尔酮异构酶研究进展［J］.
生物技术通讯 , 2008（6）：935-937.
［29］ Zhang YF, Zhang Y, Cui MK, Yan SQ. The Clone of cDNA enco-ding
chalcone isomerases and its information analyze in E. breviscapus
（vant）Hand. -Mazz［J］. Procedia Engineering, 2011, 18：194-
199.
［30］ Li FX, Jin ZP, Zhao DX, et al. Overexpression of the Saussurea me-
dusa chalcone isomerase gene in S. involucrata hairy root cultures
enhances their biosynthesis of apigenin［J］. Phytochemistry,
2006, 67（6）：553-560.
［31］雒晓鹏 , 白悦辰 , 高飞 , 等 . 金荞麦查尔酮异构酶的基因克隆
及其花期表达与黄酮量分析［J］. 中草药 , 2013, 11 ：1481-
1485.
［32］ Kuete V. 21 -Health effects of alkaloids from african medicinal
plants//［M］. Toxicological Survey of African Medicinal Plants.
2014 ：611-633.
［33］邢世海 , 王荃 , 潘琪芳 , 等 . 长春花萜类吲哚生物碱的生物合
成途径［J］. 西北植物学报 , 2012, 32（9）：1917-1127.
（责任编辑李楠）

Transcriptome Analysis of Sarcopyramis nepalensis via RNA-seq Technology

基于RNA-seq技术的楮头红转录组分析

相关文献