实施例:
除BioB和其它公知的辅因子外,基于DTB转化成生物素的过程
可能涉及一种Fe-S簇再生酶的考虑,已经做了鉴定并克隆这样一种
假拟基因的尝试。
Nifs基因能够使涉及氮固定的蛋白质Fe-S簇中的硫原子再生
(Zheng等《美国国家科学院学报》(Proc.Natl.Acad.Sci.USA)
90,1993:2754-2758和《生物化学》(Biochemistry)33,1994:
4714-4720)。按照megalign模式,通过使用Lasergene程序包
(DNA-Star Inc.)比较Swiss-Prot和PIR数据库中所有公知的Nifs
类蛋白质,可能鉴定出具有氨基酸序列HK(I,L)xGPxG(x相当于
保守性较低或没有得到保持的氨基酸)的那些蛋白质的严格保守区。
这种序列具有完整的保守性这一事实表明这些氨基酸(=Aa)在这些
蛋白质的功能性上是重要的。将这些保守的氨基酸在下文称作基元
I。
将这种方式中所述的Aa基元用作进一步分析Swiss-Prot/PIR93
或94版数据库的比较序列,以便检索出其中这种Nifs功能序列是完
全保守的蛋白质或ORFS(=可读框)。用于序列分析的程序是来自
DNA-Star包的Geneman程序。将分析参数规定如下:共有序列菜单
中含有80%保守性的基元。
除已被鉴定为Nifs同源蛋白的蛋白质外,这种检索导致这样一
项发现,即存在具有这种序列基元的其它蛋白质或ORFS。在数据库
中存在的其它序列中,可能鉴定出一种来自大肠杆菌的可读框,所述
的可读框具有共有序列的显著保守性并参与生物素的合成(这正如的
我们的研究表明的)。称作ECU29581_24(=SEQ ID No.1=ORF401)
的这种可读框编码一种来源于这种序列的401Aa的假拟蛋白。F.
Blattner及合作者(《DNA-研究》1996)的研究显示出该序列已作
为大肠杆菌基因组测序的一部分进行了测序而尚未识别其功能。将该
序列(SEQ ID No.1)在下文称作BioS1。
BioS1的蛋白质序列与来自棕色固氮杆菌的Nifs蛋白质序列的
比较结果(程序:DNA-Star“megalign”模式:配对Lipman-Pearson
校正,分析参数:k-元组2,缺口补偿4,缺口长度补偿12)证实
ECU29581_24还与在218 Aa范围内所述序列其它区中的来自棕色固
氮杆菌的Nifs具有27.6%的同源性。与鉴定为来自荚膜红球菌Nifs
的蛋白质具有的同源性在376 Aa的范围内为25.3%。
可能在SwissProt/PIR数据库中鉴别出与ECU29581_24具有同
源性的另外的序列(Geneman程序/序列相似性模式;默认设置)。
对ECU29581_24序列发现的最大相似性由从流感嗜血杆菌(数据库中
的名称为HIU00082_62)翻译的ORF(=可读框)来证实。发现在两
种蛋白质的全长内BioS1与HIU00082_62具有45.5%的序列同源
性。由此所述两种蛋白质的序列相似性或同源性均显著高于
ECU29581_24(=BioSI)与来自荚膜红球菌或棕色固氮杆菌的Nifs
间的相似性或同源性。因此,所述蛋白质可能是BioS1的流感嗜血杆
菌同系物。
Fleischmann等(《科学》(Science),269,1995:495-512)
在流感嗜血杆菌中以其与Nifs序列的相似性为基础不仅发现了ORF
HIU00082_62、而且发现了另一种ORF(HIU00072_10)。
以Fleischmann等的这种描述为基础得出结论:除bioS1外,
另一种Nifs类基因也存在于大肠杆菌中。已将这种假拟基因称作
bioS2。
1.载体pHS1和pHS2的构建:
质粒pHS1和pHS2由携带复制起点、抗性弹夹、启动子、克隆位
点和终止子的各种弹夹组成。该质粒由各种DNA片段组装。使用各种
质粒作为模板、通过PCR来制备为此所需的所述DNA片段。
a.)使用复制起点制备所述弹夹:
为了从含P15A复制子的质粒中提供复制起点作为可以克隆的弹
夹,使用质粒pRep4(Quiagen)与寡核苷酸P15A,1(5’-
GGCCCCTAGGGGATATATTCCGCTTCCTCGC-3’)和P15A,2(5’-
GGCCACTAGTAACAACTTATATCGTATGGGG-3’)、通过PCR从所述质粒中
分离出具有919个碱基长度的DNA片段。在一种合适的缓冲液中用限
制酶AvrII和SpeI切下所述片段。
PCR条件:
在100μl溶液中使用2.5U的Taq聚合酶和15pmol的寡核苷
酸来从质粒pRep4中分离复制弹夹。在50℃下使寡核苷酸退火。链
延伸在72℃下进行1分钟、30个循环以上。
b.)卡那霉素抗性弹夹的制备:
为了提供一种卡那霉素抗性弹夹作为克隆弹夹,从含有卡那霉素
抗性弹夹的质粒(pRep4,Quiagen)中、用寡核苷酸Kan-R,1(5’-
GGCCGAGCTCTCGAACCCCAGAGTCCCGCT-3’)和Kan-R,2(5’-
GGCCGACGTCGGAATTGCCAGCTGGGGCGC-3’)通过PCR来分离具有952个
碱基长度的DNA片段。在一种合适的缓冲液中用AatII和SacI切割
所述片段。
PCR条件:
在100μl溶液中使用2.5U的Taq聚合酶和15pmol的寡核苷
酸来从质粒pRep4中分离卡那霉素抗性弹夹。在50℃下使寡核苷酸
退火。链延伸在72℃下进行1分钟、30个循环以上。
c.)终止区的制备:
*为了从噬菌体λ提供终止子T0作为克隆弹夹,使用质粒
pDS12-luzi(Schroder H.等《欧洲分子生物学组织杂志》(EMBO
Journal.)12,11,1993:4137-4144)作为模板与寡核苷酸T0,1
(5’-GGCCGAGCTCGCTTGGACTCCTGTTGATAG-3’)和T0,2(5’-
GGCCACTAGTGCTTGGATTCTCACCAATAAAAAACGCCC-3’)一起通过PCR来
分离具有120个碱基长度的DNA片段。在一种合适的缓冲液中用酶
SpeI和SacI切割所述片段。
针对T0:pDS12-luzi的模板
在100μl溶液中使用2.5U的Taq聚合酶和15pmol的寡核苷
酸来从质粒pDS12-luzi中分离终止区。在50℃下使寡核苷酸退火。
链延伸在72℃下进行0.5分钟、30个循环以上。然后将120bp大小
的片段进行分离和纯化。用SpeI和SacI各20U来消化该片段。
为了从rrnB操纵子中提供终止子T1作为克隆弹夹,使用质粒
pDS12-luzi(Schroder H.等,参见上述文献)作为模板并借助于寡
核苷酸T1,1(5’-GGCCCCTAGGTCTAGGGCGGCGGATTTGTCC-3’)和T1,2
(5’-GGCCTCTAGAGGCATCAAATAAAACGAAAGGC-3’)通过PCR来分离具有
120bp长度的DNA片段。在一种合适的缓冲液中用酶XbaI和AvrII
切割所述片段。
针对T1:pDS12-luzi的模板
在100μl溶液中使用2.5U的Taq聚合酶和15pmol的寡核苷
酸来从质粒pDS12-luzi中分离终止区。在50℃下使寡核苷酸退火。
链延伸在72℃下进行0.5分钟、30个循环以上。然后将120bp大小
的片段进行分离和纯化。用XbaI和AvrII各20U来消化该片段。
d.)针对pHS1和pHS2的启动子的制备:
通过化学合成来制备寡核苷酸PPHS1,1(5’-
TCGAGATAGCATTTTTATCCATAAGATTAGCCGATCCTAAGGTTTACAATTGTGAGC
GCTCACAATTATGATAGATTCAATTGTGAGCGGATAACAATTTCACACACGCTAGCG
GTAC-3’)和PPHS1,2(5’-CGCTAGCGTGTGTGAAATTGTTATC
CGCTCACAATTGAATCTATCATAATTGTGAGCGCTCACAATTGTAAACCTTAGGATC
GGCTAATCTTATGGATAAAAATGCTATC-3’)以及PPHS2,1(5’-AATTCTC
CCTATCAGTGATAGAGATTGACATCCCTATCAGTGATAGAGATACTGAGACATCACC
AGGACGCACTGACCG-3’)和PPHS 2,2(5’-AATTCGGTCAGTGCGTCCTGG
TGATGTCTCAGTATCTCTATCACTGATAGGGATGTCAATCTCTATCACTGATAGGGA
GG-3’)。分别以1μg/μl的浓度混合寡核苷酸PPHS1,1和PPHS1,2
以及PPHS2,1和PPHS2,2、将它们在95℃下培养5分钟且然后缓慢冷
却。在连接中使用的退火的寡核苷酸浓度为10ng/μl。寡核苷酸
PPHS1,1和PPHS1,2形成针对质粒pHS1的启动子,而寡核苷酸
PPHS2,1和PPHS2,2形成针对质粒pHS2的启动子。
e.)克隆位点的制备:
为了构建克隆位点,合成两种寡核苷酸PMCS1,1(5’-
GTACCGGGCCCCCCCTCGAGGTCGACGGTATCGATAAGCTTGATATCGAATTCCTGC
AGCCCGGGGGATCCCATGGTA-3’)和PMCS1,2(5’-ACGCGTACCATGGGAT
CCCCCGGGCTGCAGGAATTCGATATCAAGCTTATCGATACCGTCGACCTCGAGGGGG
GGCCCGGTACC-3’)。以1μg/μl的浓度混合所述的两种寡核苷酸、
将它们在95℃下培养5分钟且然后缓慢冷却。在连接中使用的退火
的寡核苷酸浓度为10ng/μl。
f.)用于克隆pHS1和pHS2的过程
从pDS12 luci开始,通过SacI/AatII消化而将所述质粒的ampR
弹夹切下并用相应的含有卡那霉素抗性弹夹的SacI/AatII片段来取
代。使转化和分离阳性克隆后获得的载体进行SpeI/SacI消化并将
PCR扩增的终止子T0插入作为卡那霉素抗性弹夹下游的SpeI/SacI
片段。用XbaI/AvrII消化转化和分离阳性克隆后获得的载体并将PCR
扩增的终止子T1插入作为XbaI/AvrII片段。将所得载体用
XhoI/EcoRI消化并分别与退火的启动子寡核苷酸(PPHS1,1和
PPHS1,2以及PPHS2,1和PPHS2,2)连接。使所得的载体进行XbaI
消化并使用克列诺片段补平,在KpnI消化后分离出不含luziferase
片段的载体带。将两种另外退火的含有所述克隆位点的寡核苷酸
(PMCS1,1PMCS1,2)与用这种方式消化的所述载体连接。
2.bioS1(ECU29851_24,SEQ ID No.1)的克隆:
在提供优化过的翻译信号的条件下,通过PCR、从大肠杆菌染色
体中扩增编码BioS1的基因并将其克隆入使得所述基因在大肠杆菌
菌株中的超表达成为可能的载体中。
a.)用于从大肠杆菌染色体中扩增bioS1基因的寡核苷酸的研
制:
将BioS1扩增为一种表达弹夹,该弹夹由核糖体结合位点、编码
序列的起始密码子和用于限制酶的两种识别位点间的终止密码子组
成。对两种限制切割位点选择MluI的识别序列。借助于寡核苷酸
PbioS1,1(5’-CGCACGCGTGAGGAGTACCATGAACGT-3’)和PbioS1,2
(5’-CGCACGCGTTTAATCCACCAATAATT-3’)来克隆bioS1基因。
PCR过程:
条件:
将0.5μg来自大肠杆菌W3110的染色体DNA用作模板。所使用
的寡核苷酸PbioS1,1和PbioS1,2的浓度各自为15pM。dNTPs的浓
度为200μM。将溶于制造商提供的反应缓冲液的2.5U的Pwo DNA
聚合酶(Boehringer Mannheim)用作聚合酶。用于PCR的体积为100
μl。
扩增条件:
使DNA在94℃下变性2分钟。然后在55℃下使寡核苷酸退火30
秒。链延伸在72℃下进行45秒。将所述的PCR进行30个循环以上。
将具有1200bp大小的所得DNA产物纯化并在一种合适的缓冲液
中通过MluI来消化。
通过MluI消化5μg的载体pHS1并通过shrimp碱性磷酸酶
(SAP)(Boehringer Mannheim)使其去磷酸化。在使SAP变性后,
通过快速DNA连接试剂盒、按照制造商的说明以1∶3的摩尔比将所
述载体与片段进行连接。将连接混合物转入菌株大肠杆菌XLl-blue
中。通过质粒的制备和限制酶切分析来鉴定阳性克隆。通过限制酶切
消化和测序来确定pHS1中bioS1片段的正确取向。将所得构建体称
作pHS1 bioS1(附图1)。pHS1 bioS1的序列是在SEQ ID No.5中
被发现的。载体中衍生的bioS1的氨基酸序列是在SEQ ID No.6中
被发现的。用MluI消化2μg的载体pHS1 bioS1并通过琼脂糖凝胶
分离基因bioS1。用MluI消化载体pHS2、用SAP使其去磷酸化并以
接合连接方式与片段bioS1连接。将所述的连接混合物转入XLl-
blue,并通过质粒分离和限制酶切消化来鉴定以正确取向的阳性克
隆。将所得载体称作pHS2 bioS1(附图2)。PHS2 bioS1的序列是
在SEQ ID No.9中被发现的。载体中衍生的bioS1的氨基酸序列是
在SEQ ID No.10中被发现的。
3.bioS2(SEQ ID No.3)的克隆:
克隆入编码涉及Fe-S簇装配的基因产物的大肠杆菌另外的基因
的方式如下:
和与来自棕色固氮杆菌的NifS蛋白质高度同源(>40%)的
SwissProt/PIR数据库中蛋白质的Aa序列进行的序列比较(Megalign
程序,群体模式)证实:除上述的序列基元I外,这些蛋白质的N末
端也表现出显著的保守性。将Aa序列MIYLDNXATT鉴定为典型的Nifs
族蛋白质N末端的保守序列并称作基元IIa。
SwissProt/PIR数据库对高于80%保守性的该序列的分析证实
了公知该蛋白质N末端序列中大肠杆菌11Aa内的另一种蛋白质来源
于Edman降解(数据库中的名称:UP06_Ecoli)。将这种蛋白质看
作假拟Nifs的同系物并在下文称作BioS2(与BioS1相似)。
对蛋白质BioS2的基因克隆和测序的方式如下。从蛋白质序列
HIU00072_10开始,一方面将保守的氨基酸基元I、而另一方面将上
述UP06_Ecoli的Aa序列用于制备能够扩增bioS2基因片段的简并
的寡核苷酸。为达到这一目的,将两种Aa基元HIU00072_10(基元I)
和UP06_Ecoli(基元IIb,MKLPIYLDYSAT)反向翻译成相应的DNA
序列。通过这种方式,从基元II合成简并寡核苷酸PbioS2,1(5’-
ATGAARYTNCCNATHTAYYTNGAYTAYWSNGCNAC-3’),并从基元I合成简
并寡核苷酸PbioS2,2(5’-cccaghggrccrtgcagyttrtgrccrga-3’)。
PCR过程:
将来自大肠杆菌W3110的染色体DNA作为模板。使寡核苷酸
PbioS2,1和PbioS2,2各0.5μg在各自情况下与15pmol的核苷酸
混合物、2.5U溶于制造商提供的反应缓冲液的Pwo DNA聚合酶
(Boehringer Mannheim)反应。用于PCR的体积为100μl。
扩增条件:
变性过程在94℃下进行2分钟。在45℃下进行所述寡核苷酸的
退火并在72℃下使链延伸进行45秒。使PCR进行30个循环以上。
通过PCR可能选择性地扩增三种片段,其中之一具有来自序列比较所
预期的600bp的大小。通过琼脂糖凝胶纯化来分离这种DNA片段并
使用寡核苷酸PbioS2,2对其进行测序。在所有六种可能的读框内翻
译所得的DNA序列。然后将所得翻译的Aa序列与翻译的HIU00072_10
的Aa序列和来自棕色固氮杆菌的Nifs进行比较。翻译的读框之一显
示出与所述ORF HIU00072_10的Aa序列的高度同源性并称作bioS2。
测定并克隆bioS2完整的DNA序列(=SEQ ID No.3)的方式如
下:
首先,制备与bioS2同源的标记的DNA探针。该步骤使用PCR-
DIG标记试剂盒(Boehringer Mannheim)来进行。用于制备DIG-DNA
探针的模板是由寡核苷酸PbioS2,1和PbioS2,2制备的所述PCR产
物。
PCR的条件是:
所用物质如下:1μl的PCR模板,5μl的核苷酸DIG-dUTP混
合物,寡核苷酸PbioS2,1和PbioS2,2各15pmol,来自含有1.75mM
MgCl2的试剂盒的缓冲液,0.75μl的扩展聚合酶混合物(Boehringer
Mannheim)。
扩增条件:
DNA在94℃下解链2分钟;DNA在94℃下解链10秒;在45℃下
退火30秒;在68℃下链延伸3.30分钟、持续10个循环以上;DNA
在94℃下解链10秒;在45℃下退火30秒;在68℃下链延伸3.30
分钟;每个循环链延伸延长20秒、持续20个循环以上。用PCR纯化
试剂盒来纯化所述DIG-标记的片段。
4.用染色体DNA进行的bioS2的Southern分析
在进一步的步骤中,用限制酶消化基因组DNA并使用标记的DNA
探针、通过Southern杂交来对其进行分析。
用下列限制酶来完全消化大肠杆菌W3110染色体DNA(10μg):
EcoRI、BamHI、Acc65I、HindIII、SalI。所述混合物的体积为50μl,
且每一种酶的用量为30U。将所述的混合物培养4小时。通过TBE
缓冲液中1%的琼脂糖凝胶来对以该方式消化的DNA进行分级分离
(Sambrook,J.Fritsch,E F.Maniatis,T.第二版,冷泉港实验
室出版社,1989,ISBN0-87969-373-8)并借助于一种压力转化室
(Stratagene)将其转移到一种尼龙膜上(Boehringer Mannheim)
且通过UV照射(Stratalinker Stratagene)以共价键的方式固定
在该膜上。与DIG-标记的DNA探针的杂交在65℃下、DIG Easyhyb
缓冲液(Boehringer Mannheim)中进行15小时。按照制造商提供
的说明印迹的显影表明BioS2 DIG-DNA探针与条带进行了杂交,在
Acc65I、EcoRI和HindIII的情况下测定所述条带大小约为3-4kb。
BamHI消化的DNA证明了与带有相当高分子量的片段的bioS2探针杂
交的过程。优选将3-4kb的片段用于进一步的克隆步骤。
通过反向PCR克隆bioS2
在用EcoRI消化的情况中,鉴定了隐藏着被寻找基因的约4kb
的片段。然后通过反向PCR技术扩增并克隆完整的基因。在反向PCR
的第一个步骤中,用EcoRI完全消化大肠杆菌染色体DNA。在第二个
步骤中,在从统计学的观点来看存在分子内键的条件下,然后从前述
的限制酶切消化中,将EcoRI消化的DNA通过连接酶以共价键的方式
在低DNA浓度(约20ng/ml)下连接。在第三个步骤中,使用寡核
苷酸进行PCR,所述的寡核苷酸的序列对所寻找的靶基因具有特异
性。
可以将特异性扩增的DNA片段以其大小为基础进行鉴定,所述的
DNA片段的大小从来自Southern分析的限制片段的大小和来自所述
基因公知部分中寡核苷酸的定位来显示。然后将以这种方式鉴定的片
段克隆入一种诸如pBS SK Bluescript/pCR Script(Stratagene)
这样的合适载体中并对其进行测序。
实验步骤:
用15U、50μl体积的EcoRI(Boehringer Mannheim)将来自
菌株W3110的1μg染色体DNA完全消化。通过将30μl装上琼脂糖
凝胶来检测消化的完全性。在15℃下,将100μl体积的来自这种染
色体DNA消化的片段(10μl的消化物=200ng)与10μl的连接缓
冲液和2U的T4连接酶(Boehringer Mannheim)一起温育15小时
(分子内连接反应)。在该连接反应后,通过在65℃下温育20分钟
来使T4连接酶失活。将5μl的这种连接混合物用作PCR的模板。从
bioS2的序列开始合成引物PbioS2,3(5’-GCGTGGGTAAACTGCCTA
TCGACCTGAGCC-3’)和PbioS2,4(5’-CTACGCTTC CTTCAGCCTGCCAG
CCGAAA-3’)。
使寡核苷酸PbioS2,3在bioS2的5’端上杂交并导致在3’方向上
在互补链的5’端上进行编码序列的扩增延伸。
使寡核苷酸PbioS2,4在bioS2的3’端上杂交并导致在3’方向上
在编码链的3’端上进行编码序列的扩增延伸。
所用的物质如下:5μl的连接混合物,1.75μl的脱氧核苷酸
混合物(350μmol,Boehringer Mannheim),寡核苷酸PbioS2,5
和PbioS2,6各15pmol,来自含有1.75mM MgCl2的试剂盒的缓冲
液1,0.75μl的扩展聚合酶混合物。
用引物PbioS2,3/PbioS2,4扩增连接的大肠杆菌DNA的条件。扩
展试剂盒(Boehringer Mannheim),DNA在94℃下解链2分钟;DNA
在94℃下解链10秒;在61℃下退火30秒;在68℃下链延伸3.30
分钟、持续10个循环以上;DNA在94℃下解链10秒;在61℃下退
火30秒;在68℃下链延伸3.30分钟;每个循环链延伸伸长20秒、
持续20个循环以上。
所述的扩增产生约3kb的PCR产物。这种DNA片段证明了在严格
条件下与上述bioS2-DIG-DNA探针的Southern杂交是显著的。
设想这种DNA片段含有与bioS2高度同源的DNA序列。由此将这
种DNA片段克隆入一种载体以便进一步对其进行特征鉴定并对其测
序。使用pCR Script试剂盒(Stratagene),首先将所述的DNA片
段按照制造商提供的说明用Pfu聚合酶进行处理,然后将其连入载体
pCR Script。将该连接混合物转入XL1-blue细胞(Stratagene)并
铺在LB-Amp上。通过小量制备分析来鉴定隐藏片段的阳性克隆。测
序结果显示出如SEQ ID No.3(=bioS2)所述的完整的序列。
然后将BioS2作为类似于bioS1的表达弹夹来扩增并克隆。为了
达到这一目的,通过使用寡核苷酸PbioS2,5(5’-CATGACGCGTAAA
GAGGAGAAATTAACTATGAAATTACCGATTTATTTGG-3’)和PbioS2,6(5’-
GCGACGCGTGATTAATGATGAGCCCAT-3’)的PCR而将MluI识别位点和最
优化的SD序列添加到所述基因的5’端上并将MluI识别位点添加到所
述基因的3’端上。
PCR过程:
将来自W3110的0.5μg染色体DNA用作模板。所用的寡核苷酸
PbioS2,5和PbioS2,6的浓度各为15pM。dNTPS的浓度为200μM。
在制造商提供的反应缓冲液中将2.5U的Pwo DNA聚合酶
(Boehringer Mannheim)用作聚合酶。用于PCR的体积为100μl。
扩增条件:
变性过程在94℃下进行2秒。寡核苷酸在55℃下退火30秒,链
延伸在72℃下进行45秒。使PCR进行30个循环以上。
用PCR纯化试剂盒(Boehringer Mannheim)来纯化所得具有约
1200bp正确大小的DNA产物并用溶于合适缓冲液的MluI对其进行
消化。
通过MluI消化5μg的载体pHS2并通过shrimp碱性磷酸酶
(SAP)(Boehringer Mannheim)使其去磷酸化。在使SAP变性后,
通过快速DNA连接试剂盒、按照制造商的说明以1∶3的摩尔比将所
述载体与片段进行连接。将该连接混合物转入菌株XL1-blue中。通
过质粒的制备和限制酶切分析来鉴定阳性克隆。通过限制酶切消化和
测序来确定pHS2中bioS2片段的正确取向。将所述载体称作pHS2
bioS2(附图3)。pHS2 bioS2的序列是在SEQ ID No.11中被发现
的。载体中衍生的bioS2的氨基酸序列是在SEQ ID No.12中被发现
的。以一种类似的方式将bioS2克隆入载体pHS1。 pHS1 bioS2的序
列是在SEQ ID No.7中被发现的(附图4)。载体中衍生的bioS2的
氨基酸序列是在SEQ ID No.8中被发现的。
5.质粒pHBbio14的构建:
用一种转导λ噬菌体在体内克隆Bio操纵子。通过将大肠杆菌
bio-阴性菌株转导成bio+来筛选λbio+噬菌体。繁殖所分离的bio+
转导λ噬菌体并纯化λDNA。随后从λ噬菌体DNA中切下具有完整生物
素操纵子的8.7kb EcoRI/HindIII片段并将所述片段连入已用
EcoRI/HindIII切割的pBR322。通过质粒的制备和限制酶切分析来
鉴定阳性克隆。
1.2kb的bioD的3’片段的缺失
使bioD基因3’端上的非必须基因序列缺失。为了达到这一目
的,通过PCR将EcoRI切割位点引入bioD终止密码子之后。根据
Otsuka等所述的操纵子序列(《生物化学杂志》(J.Bio.Chem.)
263,1988:19577-85)开发了用于该PCR的寡核苷酸Pbio1,1
(5’-AATAAGGAATTCTTATGTACTTTCCGGTTGCCG-3’)和Pbio1,2(5’-
AACAGCAGCCTGCAGCTGGATTA-3’)。
PCR的条件:
使2.5U Taq聚合酶(Perkin Elmer)与15pmol的每一种引
物在100μl的体积中反应。在50℃下进行退火并使链延伸在72℃下
进行1分钟、30个循环以上。分离488bp的片段并在琼脂糖凝胶上
纯化。用EcoRI/PstI消化所得的片段。用EcoRI/PstI消化pHBbio1。
分离得到9.5kb的片段。
将所述的9.5kb的片段连入488bp的片段并转入XL1-blue细
胞。通过质粒的制备并通过使用酶EcoRI和HindIII对所述质粒DNA
的限制酶切分析来分析所得的克隆,并鉴定了隐藏5.9kb片段的阳
性克隆。分离一种克隆并称作pHBbio2。从该克隆中获得质粒DNA。
用EcoRI/HindIII消化5μg的pHBbio2,并分离出含有完整生物素
生物合成基因的5.9kb片段。
用EcoRI和HindIII消化5μg的质粒pAT153。将所得的含有生
物素生物合成基因的5.9kb片段与所消化的载体pAT153连接并转入
XL1-blue。通过质粒的制备并通过使用酶EcoRI和HindIII对所述
质粒DNA的限制酶切分析来分析所得的克隆。鉴定阳性克隆,且分离
一种克隆并称作pHBbio14。
6.通过超表达bioS1增加生物素的产量
通过使用已生长在隐藏recA::Tn10的菌株上的P1裂解物的P1
转导而将菌株BM4092(Barker和Campbell)转导成recA-。通过阳
性转导体UV敏感性的增加来检测转导是否成功。然后用质粒
pHBbio14、通过CaCl2法来转化所得菌株LU8091并在含氨苄青霉素
100μg/ml的LB上进行培养。分离一种克隆并各自用质粒pHS1 bioS1
和pHS2 bioS2、通过CaCl2法将其转化并在含氨苄青霉素100μg/ml
和卡那霉素25μg/ml的LB琼脂上进行筛选。
将各转化体中的一个菌落接种入含有合适抗生素的DYT培养基
中并培养12小时。取10ml放置过夜的培养物用于接种入含有合适
抗生素的TB培养基(Sambrook,J.Fritsch,E F.Maniatis,T.
第二版,冷泉港实验室出版社,1989 ISBN0-87969-373-8)并培养
24小时。生长完成后,通过离心从培养上清液中取出细胞并使用所
述上清液中的链霉抗生物素蛋白和抗生物素蛋白、通过ELISA来测定
生物素和去硫生物素的浓度。该测定结果可以在表I中找到。
表I:生物素和去硫生物素浓度的测定结果
菌株
质粒I
质粒II
生物素mg/l
去硫生物素mg/l
Lu8091
PHBbio14
9.4
45.6
Lu8091
PHBbio14
pHS1 bioS1
15.3
19.7
Lu8091
PHBbio14
pHS2 bioS1
19.2
15.8
序列表
(1)一般信息:
(i)申请人:
(A)姓名:BASF Aktiengesellschaft
(B)街:Carl BoSch Strasse
(C)城市:Ludwigshafen
(D)州:Rheinland-Pfalz
(E)国家:德国
(F)邮政编码:D-67056
(ii)发明名称:生物素的制备方法
(iii)序列数:12
(iv)计算机可读形式:
(A)媒体类型:软盘
(B)计算机:IBM PC兼容机
(C)操作系统:PC-DOS/MS-DOS
(D)软件:PatentIn Release#1.0,版本#1.25(EPO)
(2)SEQ ID NO:1的信息:
(i)序列特征:
(A)长度:1217个碱基对
(B)类型:核酸
(C)链型:单链
(D)拓扑结构:线性
(ii)分子类型:DNA(基因组)
(iii)假拟结构:无
(iv)反义:无
(vi)原始来源:
(A)生物体:大肠杆菌
(B)菌株:W3110
(ix)特征:
(A)名称/关键词:5’UTR
(B)位置:1..11
(ix)特征:
(A)名称/关键词:CDS
(B)位置:12..1217
(xi)序列描述:SEQ ID NO:1:
CGAGGAGTAC C ATG AAC GTT TTT AAT CCC GCG CAG TTT CGC GCC CAG TTT 50
Met Asn Val Phe Asn Pro Ala Gln Phe Arg Ala Gln Phe
1 5 10
CCC GCA CTA CAG GAT GCG GGC GTC TAT CTC GAC AGC GCC GCG ACC GCG 98
Pro Ala Leu Gln Asp Ala Gly Val Tyr Leu Asp Ser Ala Ala Thr Ala
15 20 25
CTT AAA CCT GAA GCC GTG GTT GAA GCC ACC CAA CAG TTT TAC AGT CTG 146
Leu Lys Pro Glu Ala Val Val Glu Ala Thr Gln Gln Phe Tyr Ser Leu
30 35 40 45
AGC GCC GGA AAC GTC CAT CGC AGC CAG TTT GCC GAA GCC CAA CGC CTG 194
Ser Ala Gly Asn Val His Arg Ser Gln Phe Ala Glu Ala Gln Arg Leu
50 55 60
ACC GCG CGT TAT GAA GCT GCA CGA GAG AAA GTG GCG CAA TTA CTG AAT 242
Thr Ala Arg Tyr Glu Ala Ala Arg Glu Lys Val Ala Gln Leu Leu Asn
65 70 75
GCA CCG GAT GAT AAA ACT ATC GTC TGG ACG CGC GGC ACC ACT GAA TCC 290
Ala Pro Asp Asp Lys Thr Ile Val Trp Thr Arg Gly Thr Thr Glu Ser
80 85 90
ATC AAC ATG GTG GCA CAA TGC TAT GCG CGT CCG CGT CTG CAA CCG GGC 338
Ile Asn Met Val Ala Gln Cys Tyr Ala Arg Pro Arg Leu Gln Pro Gly
95 100 105
GAT GAG ATT ATT GTC AGC GTG GCA GAA CAC CAC GCC AAC CTC GTC CCC 386
Asp Glu Ile Ile Val Ser Val Ala Glu His His Ala Asn Leu Val Pro
110 115 120 125
TGG CTG ATG GTC GCC CAA CAA ACT GGA GCC AAA GTG GTG AAA TTG CCG 434
Trp Leu Met Val Ala Gln Gln Thr Gly Ala Lys Val Val Lys Leu Pro
130 135 140
CTT AAT GCG CAG CGA CTG CCG GAT GTC GAT TTG TTG CCA GAA CTG ATT 482
Leu Asn Ala Gln Arg Leu Pro Asp Val Asp Leu Leu Pro Glu Leu Ile
145 150 155
ACT CCC CGT AGT CGG ATT CTG GCG TTG GGT CAG ATG TCG AAC GTT ACT 530
Thr Pro Arg Ser Arg Ile Leu Ala Leu Gly Gln Met Ser Asn Val Thr
160 165 170
GGC GGT TGC CCG GAT CTG GCG CGA GCG ATT ACC TTT GCT CAT TCA GCC 578
Gly Gly Cys Pro Asp Leu Ala Arg Ala Ile Thr Phe Ala His Ser Ala
175 180 185
GGG ATG GTG GTG ATG GTT GAT GGT GCT CAG GGG GCA GTG CAT TTC CCC 626
Gly Met Val Val Met Val Asp Gly Ala Gln Gly Ala Val His Phe Pro
190 195 200 205
GCG GAT GTT CAG CAA CTG GAT ATT GAT TTC TAT GCT TTT TCA GGT CAC 674
Ala Asp Val Gln Gln Leu Asp Ile Asp Phe Tyr Ala Phe Ser Gly His
210 215 220
AAA CTG TAT GGC CCG ACA GGT ATC GGC GTG CTG TAT GGT AAA TCA GAA 722
Lys Leu Tyr Gly Pro Thr Gly Ile Gly Val Leu Tyr Gly Lys Ser Glu
225 230 235
CTG CTG GAG GCG ATG TCG CCC TGG CTG GGC GGC GGC AAA ATG GTT CAC 770
Leu Leu Glu Ala Met Ser Pro Trp Leu Gly Gly Gly Lys Met Val His
240 245 250
GAA GTG AGT TTT GAC GGC TTC ACG ACT CAA TCT GCG CCG TGG AAA CTG 818
Glu Val Ser Phe Asp Gly Phe Thr Thr Gln Ser Ala Pro Trp Lys Leu
255 260 265
GAA GCT GGA ACG CCA AAT GTC GCT GGT GTC ATA GGA TTA AGC GCG GCG 866
Glu Ala Gly Thr Pro Asn Val Ala Gly Val Ile Gly Leu Ser Ala Ala
270 275 280 285
CTG GAA TGG CTG GCA GAT TAC GAT ATC AAC CAG GCC GAA AGC TGG AGC 914
Leu Glu Trp Leu Ala Asp Tyr Asp Ile Asn Gln Ala Glu Ser Trp Ser
290 295 300
CGT AGC TTA GCA ACG CTG GCG GAA GAT GCG CTG GCG AAA CGT CCC GGC 962
Arg Ser Leu Ala Thr Leu Ala Glu Asp Ala Leu Ala Lys Arg Pro Gly
305 310 315
TTT CGT TCA TTC CGC TGC CAG GAT TCC AGC CTG CTG GCC TTT GAT TTT 1010
Phe Arg ser Phe Arg Cys Gln Asp Ser Ser Leu Leu Ala Phe Asp Phe
320 325 330
GCT GGC GTT CAT CAT AGC GAT ATG GTG ACG CTG CTG GCG GAG TAC GGT 1058
Ala Gly Val His His Ser Asp Met Val Thr Leu Leu Ala Glu Tyr Gly
335 340 345
ATT GCC CTG CGG GCC GGG CAG CAT TGC GCT CAG CCG CTA CTG GCA GAA 1106
Ile Ala Leu Arg Ala Gly Gln His Cys Ala Gln Pro Leu Leu Ala Glu
350 355 360 365
TTA GGC GTA ACC GGC ACA CTG CGC GCC TCT TTT GCG CCA TAT AAT ACA 1154
Leu Gly Val Thr Gly Thr Leu Arg Ala Ser Phe Ala Pro Tyr Asn Thr
370 375 380
AAG AGT GAT GTG GAT GCG CTG GTG AAT GCC GTT GAC CGC GCG CTG GAA 1202
Lys Ser Asp Val Asp Ala Leu Val Asn Ala Val Asp Arg Ala Leu Glu
385 390 395
TTA TTG GTG GAT TA 1217
Leu Leu Val Asp
400
(2)SEQ ID NO:2的信息:
(i)序列特征:
(A)长度:401个氨基酸
(B)类型:氨基酸
(D)拓扑结构:线性
(ii)分子类型:蛋白质
(xi)序列描述:SEQ ID NO:2:
Met Asn Val Phe Asn Pro Ala Gln Phe Arg Ala Gln Phe Pro Ala Leu
1 5 10 15
Gln Asp Ala Gly Val Tyr Leu Asp Ser Ala Ala Thr Ala Leu Lys Pro
20 25 30
Glu Ala Val Val Glu Ala Thr Gln Gln Phe Tyr Ser Leu Ser Ala Gly
35 40 45
Asn Val His Arg Ser Gln Phe Ala Glu Ala Gln Arg Leu Thr Ala Arg
50 55 60
Tyr Glu Ala Ala Arg Glu Lys Val Ala Gln Leu Leu Asn Ala Pro Asp
65 70 75 80
Asp Lys Thr Ile Val Trp Thr Arg Gly Thr Thr Glu Ser Ile Asn Met
85 90 95
Val Ala Gln Cys Tyr Ala Arg Pro Arg Leu Gln Pro Gly Asp Glu Ile
100 105 110
Ile Val Ser Val Ala Glu His His Ala Asn Leu Val Pro Trp Leu Met
115 120 125
Val Ala Gln Gln Thr Gly Ala Lys Val Val Lys Leu Pro Leu Asn Ala
130 135 140
Gln Arg Leu Pro Asp Val Asp Leu Leu Pro Glu Leu Ile Thr Pro Arg
145 150 155 160
Ser Arg Ile Leu Ala Leu Gly Gln Met Ser Asn Val Thr Gly Gly Cys
165 170 175
Pro Asp Leu Ala Arg Ala Ile Thr Phe Ala His Ser Ala Gly Met Val
180 185 190
Val Met Val Asp Gly Ala Gln Gly Ala Val His Phe Pro Ala Asp Val
195 200 205
Gln Gln Leu Asp Ile Asp Phe Tyr Ala Phe Ser Gly His Lys Leu Tyr
210 215 220
Gly Pro Thr Gly Ile Gly Val Leu Tyr Gly Lys Ser Glu Leu Leu Glu
225 230 235 240
Ala Met Ser Pro Trp Leu Gly Gly Gly Lys Met Val His Glu Val Ser
245 250 255
Phe Asp Gly Phe Thr Thr Gln Ser Ala Pro Trp Lys Leu Glu Ala Gly
260 265 270
Thr Pro Asn Val Ala Gly Val Ile Gly Leu Ser Ala Ala Leu Glu Trp
275 280 285
Leu Ala Asp Tyr Asp Ile Asn Gln Ala Glu Ser Trp Ser Arg Ser Leu
290 295 300
Ala Thr Leu Ala Glu Asp Ala Leu Ala Lys Arg Pro Gly Phe Arg Ser
305 310 315 320
Phe Arg Cys Gln Asp Ser Ser Leu Leu Ala Phe Asp Phe Ala Gly Val
325 330 335
His His Ser Asp Met Val Thr Leu Leu Ala Glu Tyr Gly Ile Ala Leu
340 345 350
Arg Ala Gly Gln His Cys Ala Gln Pro Leu Leu Ala Glu Leu Gly Val
355 360 365
Thr Gly Thr Leu Arg Ala Ser Phe Ala Pro Tyr Asn Thr Lys Ser Asp
370 375 380
Val Asp Ala Leu Val Asn Ala Val Asp Arg Ala Leu Glu Leu Leu Val
385 390 395 400
Asp
(2)SEQ ID NO:3的信息:
(i)序列特征:
(A)长度:1233个碱基对
(B)类型:核酸
(C)链型:单链
(D)拓扑结构:线性
(ii)分子类型:DNA(基因组)
(iii)假拟结构:无
(iv)反义:无
(vi)原始来源:
(A)生物体:大肠杆菌
(B)菌株:W3110
(ix)特征:
(A)名称/关键词:CDS
(B)位置:19..1233
(ix)特征:
(A)名称/关键词:5’UTR
(B)位置:1..18
(xi)序列描述:SEQ ID NO:3:
AAAGAGGAGA AATTAACT ATG AAA TTA CCG ATT TAT CTC GAC TAC TCC GCA 51
Met Lys Leu Pro Ile Tyr Leu Asp Tyr Ser Ala
1 5 10
ACC ACG CCG GTG GAC CCG CGT GTT GCC GAG AAA ATG ATG CAG TTT ATG 99
Thr Thr Pro Val Asp Pro Arg Val Ala Glu Lys Met Met Gln Phe Met
15 20 25
ACG ATG GAC GGA ACC TTT GGT AAC CCG GCC TCC CGT TCT CAC CGT TTC 147
Thr Met Asp Gly Thr Phe Gly Asn Pro Ala Ser Arg Ser His Arg Phe
30 35 40
GGC TGG CAG GCT GAA GAA GCG GTA GAT ATC GCC CGT AAT CAG ATT GCC 195
Gly Trp Gln Ala Glu Glu Ala Val Asp Ile Ala Arg Asn Gln Ile Ala
45 50 55
GAT CTG GTC GGC GCT GAT CCG CGT GAA ATC GTC TTT ACC TCT GGT GCA 243
Asp Leu Val Gly Ala Asp Pro Arg Glu Ile Val Phe Thr Ser Gly Ala
60 65 70 75
ACC GAA TCT GAC AAC CTG GCG ATC AAA GGT GCA GCC AAC TTT TAT CAG 291
Thr Glu Ser Asp Asn Leu Ala Ile Lys Gly Ala Ala Asn Phe Tyr Gln
80 85 90
AAA AAA GGC AAG CAC ATC ATC ACC AGC AAA ACC GAA CAC AAA GCG GTA 339
Lys Lys Gly Lys His Ile Ile Thr Ser Lys Thr Glu His Lys Ala Val
95 100 105
CTG GAT ACC TGC CGT CAG CTG GAG CGC GAA GGT TTT GAA GTC ACC TAC 387
Leu Asp Thr Cys Arg Gln Leu Glu Arg Glu Gly Phe Glu Val Thr Tyr
110 115 120
CTG GCA CCG CAG CGT AAC GGC ATT ATC GAC CTG AAA GAA CTT GAA GCA 435
Leu Ala Pro Gln Arg Asn Gly Ile Ile Asp Leu Lys Glu Leu Glu Ala
125 130 135
GCG ATG CGT GAC GAC ACC ATC CTC GTG TCC ATC ATG CAC GTA AAT AAC 483
Ala Met Arg Asp Asp Thr Ile Leu Val Ser Ile Met His Val Asn Asn
140 145 150 155
GAA ATC GGC GTG GTG CAG GAT ATC GCG GCT ATC GGC GAA ATG TGC CGT 531
Glu Ile Gly Val Val Gln Asp Ile Ala Ala Ile Gly Glu Met Cys Arg
160 165 170
GCT CGT GGC ATT ATC TAT CAC GTT GAT GCA ACC CAG AGC GTG GGT AAA 579
Ala Arg Gly Ile Ile Tyr His Val Asp Ala Thr Gln Ser Val Gly Lys
175 180 185
CTG CCT ATC GAC CTG AGC CAG TTG AAA GTT GAC CTG ATG TCT TTC TCC 627
Leu Pro Ile Asp Leu Ser Gln Leu Lys Val Asp Leu Met Ser Phe Ser
190 195 200
GGT CAC AAA ATC TAT GGC CCG AAA GGT ATC GGT GCG CTG TAT GTA CGT 675
Gly His Lys Ile Tyr Gly Pro Lys Gly Ile Gly Ala Leu Tyr Val Arg
205 210 215
CGT AAA CCG CGC GTA CGC ATC GAA GCG CAA ATG CAC GGC GGC GGT CAC 723
Arg Lys Pro Arg Val Arg Ile Glu Ala Gln Met His Gly Gly Gly His
220 225 230 235
GAG CGC GGT ATG CGT TCC GGC ACT CTG CCT GTT CAC CAG ATC GTC GGA 771
Glu Arg Gly Met Arg Ser Gly Thr Leu Pro Val His Gln Ile Val Gly
240 245 250
ATG GGC GAG GCC TAT CGC ATC GCA AAA GAA GAG ATG GCG ACC GAG ATG 819
Met Gly Glu Ala Tyr Arg Ile Ala Lys Glu Glu Met Ala Thr Glu Met
255 260 265
GAA CGT CTG CGC GGC CTG CGT AAC CGT CTG TGG AAC GGC ATC AAA GAT 867
Glu Arg Leu Arg G1y Leu Arg Asn Arg Leu Trp Asn Gly Ile Lys Asp
270 275 280
ATC GAA GAA GTT TAC CTG AAC GGT GAC CTG GAA CAC GGT GCG CCG AAC 915
Ile Glu Glu Val Tyr Leu Asn Gly Asp Leu Glu His Gly Ala Pro Asn
285 290 295
ATT CTC AAC GTC AGC TTC AAC TAC GTT GAA GGT GAG TCG CTG ATT ATG 963
Ile Leu Asn Val Ser Phe Asn Tyr Val Glu Gly Glu Ser Leu Ile Met
300 305 310 315
GCG CTG AAA GAC CTC GCA GTT TCT TCA GGT TCC GCC TGT ACG TCA GCA 1011
Ala Leu Lys Asp Leu Ala Val Ser Ser Gly Ser Ala Cys Thr Ser Ala
320 325 330
AGC CTC GAA CCG TCC TAC GTG CTG CGC GCG CTG GGG CTG AAC GAC GAG 1059
Ser Leu Glu Pro Ser Tyr Val Leu Arg Ala Leu Gly Leu Asn Asp Glu
335 340 345
CTG GCA CAT AGC TCT ATC CGT TTC TCT TTA GGT CGT TTT ACT ACT GAA 1107
Leu Ala His Ser Ser Ile Arg Phe Ser Leu Gly Arg Phe Thr Thr Glu
350 355 360
GAA GAG ATC GAC TAC ACC ATC GAG TTA GTT CGT AAA TCC ATC GGT CGT 1155
Glu Glu Ile Asp Tyr Thr Ile Glu Leu Val Arg Lys Ser Ile Gly Arg
365 370 375
CTG CGT GAC CTT TCT CCG CTG TGG GAA ATG TAC AAG CAG GGC GTG GAT 1203
Leu Arg Asp Leu Ser Pro Leu Trp Glu Met Tyr Lys Gln Gly Val Asp
380 385 390 395
CTG AAC AGC ATC GAA TGG GCT CAT CAT TA 1233
Leu Asn Ser Ile Glu Trp Ala His His
400 405
(2)SEQ ID NO:4的信息:
(i)序列特征:
(A)长度:404个氨基酸
(B)类型:氨基酸
(D)拓扑结构:线性
(ii)分子类型:蛋白质
(xi)序列描述:SEQ ID NO:4:
Met Lys Leu Pro Ile Tyr Leu Asp Tyr Ser Ala Thr Thr Pro Val Asp
1 5 10 15
Pro Arg Val Ala Glu Lys Met Met Gln Phe Met Thr Met Asp Gly Thr
20 25 30
Phe Gly Asn Pro Ala Ser Arg Ser His Arg Phe Gly Trp Gln Ala Glu
35 40 45
Glu Ala Val Asp Ile Ala Arg Asn Gln Ile Ala Asp Leu Val Gly Ala
50 55 60
Asp Pro Arg Glu Ile Val Phe Thr Ser Gly Ala Thr Glu Ser Asp Asn
65 70 75 80
Leu Ala Ile Lys Gly Ala Ala Asn Phe Tyr Gln Lys Lys Gly Lys His
85 90 95
Ile Ile Thr Ser Lys Thr Glu His Lys Ala Val Leu Asp Thr Cys Arg
100 105 110
Gln Leu Glu Arg Glu Gly Phe Glu Val Thr Tyr Leu Ala Pro Gln Arg
115 120 125
Asn Gly Ile Ile ASp Leu Lys Glu Leu Glu Ala Ala Met Arg Asp Asp
130 135 140
Thr Ile Leu Val Ser Ile Met His Val Asn Asn Glu Ile Gly Val Val
145 150 155 160
Gln Asp Ile Ala Ala Ile Gly Glu Met Cys Arg Ala Arg Gly Ile Ile
165 170 175
Tyr His Val Asp Ala Thr Gln Ser Val Gly Lys Leu Pro Ile Asp Leu
180 185 190
Ser Gln Leu Lys Val Asp Leu Met Ser Phe Ser Gly His Lys Ile Tyr
195 200 205
Gly Pro Lys Gly Ile Gly Ala Leu Tyr Val Arg Arg Lys Pro Arg Val
210 215 220
Arg Ile Glu Ala Gln Met His Gly Gly Gly His Glu Arg Gly Met Arg
225 230 235 240
Ser Gly Thr Leu Pro Val His Gln Ile Val Gly Met Gly Glu Ala Tyr
245 250 255
Arg Ile Ala Lys Glu Glu Met Ala Thr Glu Met Glu Arg Leu Arg Gly
260 265 270
Leu Arg Asn Arg Leu Trp Asn Gly Ile Lys Asp Ile Glu Glu Val Tyr
275 280 285
Leu Asn Gly Asp Leu Glu His Gly Ala Pro Asn Ile Leu Asn Val Ser
290 295 300
Phe Asn Tyr Val Glu Gly Glu Ser Leu Ile Met Ala Leu Lys Asp Leu
305 310 315 320
Ala Val Ser Ser Gly Ser Ala Cys Thr Ser Ala Ser Leu Glu Pro Ser
325 330 335
Tyr Val Leu Arg Ala Leu Gly Leu Asn Asp Glu Leu Ala His Ser Ser
340 345 350
Ile Arg Phe Ser Leu Gly Arg Phe Thr Thr Glu Glu Glu Ile Asp Tyr
355 360 365
Thr Ile Glu Leu Val Arg Lys Ser Ile Gly Arg Leu Arg Asp Leu Ser
370 375 380
Pro Leu Trp Glu Met Tyr Lys Gln Gly Val Asp Leu Asn Ser Ile Glu
385 390 395 400
Trp Ala His His
(2)SEQ ID NO:5的信息:
(i)序列特征:
(A)长度:3794个碱基对
(B)类型:核酸
(C)链型:单链
(D)拓扑结构:环状
(ii)分子类型:DNA(基因组)
(iii)假拟结构:无
(iv)反义:无
(vii)直接来源:
(B)克隆:pHS1bioS1
(ix)特征:
(A)名称/关键词:CDS
(B)位置:601..1806
(xi)序列描述:SEQ ID NO:5:
GACGTCTGTG TGGAATTGTG AGCGGATAAC AATTTCACAC AGGGCCCTCG GACACCGAGG 60
AGAATGTCAA GAGGCGAACA CACAACGTCT TGGAGCGCCA GAGGAGGAAC GAGCTAAAAC 120
GGAGCTTTTT TGCCCTGCGT GACCAGATCC CGGAGTTGGA AAACAATGAA AAGGCCCCCA 180
AGGTAGTTAT CCTTAAAAAA GCCACAGCAT ACATCCTGTC CGTCCAAGCA GAGGAGCAAA 240
AGCTCATTTC TGAAGAGGAC TTGTTGCGGA AACGACGAGA ACAGTTGAAA CACAAACTTG 300
AACAGCTACG GAACTCTTGT GCGTAAGGAA AAGTAAGGAA AACGATTCCT TCTAACAGAA 360
ATGTCCTGAG CAATCACCTA TGAACTGTCG ACTCGAGATA GCATTTTTAT CCATAAGATT 420
AGCCGATCCT AAGGTTTACA ATTGTGAGCG CTCACAATTA TGATAGATTC AATTGTGAGC 480
GGATAACAAT TTCACACACG CTAGCGGTAC CGGGCCCCCC CTCGAGGTCG ACGGTATCGA 540
TAAGCTTGAT ATCGAATTCC TGCAGCCCGG GGGATCCCAT GGTACGCGTC GAGGAGTACC 600
ATG AAC GTT TTT AAT CCC GCG CAG TTT CGC GCC CAG TTT CCC GCA CTA 648
Met Asn Val Phe Asn Pro Ala Gln Phe Arg Ala Gln Phe Pro Ala Leu
1 5 10 15
CAG GAT GCG GGC GTC TAT CTC GAC AGC GCC GCG ACC GCG CTT AAA CCT 696
Gln Asp Ala Gly Val Tyr Leu Asp Ser Ala Ala Thr Ala Leu Lys Pro
20 25 30
GAA GCC GTG GTT GAA GCC ACC CAA CAG TTT TAC AGT CTG AGC GCC GGA 744
Glu Ala Val Val Glu Ala Thr Gln Gln Phe Tyr Ser Leu Ser Ala Gly
35 40 45
AAC GTC CAT CGC AGC CAG TTT GCC GAA GCC CAA CGC CTG ACC GCG CGT 792
Asn Val His Arg Ser Gln Phe Ala Glu Ala Gln Arg Leu Thr Ala Arg
50 55 60
TAT GAA GCT GCA CGA GAG AAA GTG GCG CAA TTA CTG AAT GCA CCG GAT 840
Tyr Glu Ala Ala Arg Glu Lys Val Ala Gln Leu Leu Asn Ala Pro Asp
65 70 75 80
GAT AAA ACT ATC GTC TGG ACG CGC GGC ACC ACT GAA TCC ATC AAC ATG 888
Asp Lys Thr Ile Val Trp Thr Arg Gly Thr Thr Glu Ser Ile Asn Met
85 90 95
GTG GCA CAA TGC TAT GCG CGT CCG CGT CTG CAA CCG GGC GAT GAG ATT 936
Val Ala Gln Cys Tyr Ala Arg Pro Arg Leu Gln Pro Gly Asp Glu Ile
100 105 110
ATT GTC AGC GTG GCA GAA CAC CAC GCC AAC CTC GTC CCC TGG CTG ATG 984
Ile Val Ser Val Ala Glu His His Ala Asn Leu Val Pro Trp Leu Met
115 120 125
GTC GCC CAA CAA ACT GGA GCC AAA GTG GTG AAA TTG CCG CTT AAT GCG 1032
Val Ala Gln Gln Thr Gly Ala Lys Val Val Lys Leu Pro Leu Asn Ala
130 135 140
CAG CGA CTG CCG GAT GTC GAT TTG TTG CCA GAA CTG ATT ACT CCC CGT 1080
Gln Arg Leu Pro Asp Val Asp Leu Leu Pro Glu Leu Ile Thr Pro Arg
145 150 155 160
AGT CGG ATT CTG GCG TTG GGT CAG ATG TCG AAC GTT ACT GGC GGT TGC 1128
Ser Arg Ile Leu Ala Leu Gly Gln Met Ser Asn Val Thr Gly Gly Cys
165 170 175
CCG GAT CTG GCG CGA GCG ATT ACC TTT GCT CAT TCA GCC GGG ATG GTG 1176
Pro Asp Leu Ala Arg Ala Ile Thr Phe Ala His Ser Ala Gly Met Val
180 185 190
GTG ATG GTT GAT GGT GCT CAG GGG GCA GTG CAT TTC CCC GCG GAT GTT 1224
Val Met Val Asp Gly Ala Gln Gly Ala Val His Phe Pro Ala Asp Val
195 200 205
CAG CAA CTG GAT ATT GAT TTC TAT GCT TTT TCA GGT CAC AAA CTG TAT 1272
Gln Gln Leu Asp Ile Asp Phe Tyr Ala Phe Ser Gly His Lys Leu Tyr
210 215 220
GGC CCG ACA GGT ATC GGC GTG CTG TAT GGT AAA TCA GAA CTG CTG GAG 1320
Gly Pro Thr Gly Ile Gly Val Leu Tyr Gly Lys Ser Glu Leu Leu Glu
225 230 235 240
GCG ATG TCG CCC TGG CTG GGC GGC GGC AAA ATG GTT CAC GAA GTG AGT 1368
Ala Met Ser Pro Trp Leu Gly Gly Gly Lys Met Val His Glu Val Ser
245 250 255
TTT GAC GGC TTC ACG ACT CAA TCT GCG CCG TGG AAA CTG GAA GCT GGA 1416
Phe Asp Gly Phe Thr Thr Gln Ser Ala Pro Trp Lys Leu Glu Ala Gly
260 265 270
ACG CCA AAT GTC GCT GGT GTC ATA GGA TTA AGC GCG GCG CTG GAA TGG 1464
Thr Pro Asn Val Ala Gly Val Ile Gly Leu Ser Ala Ala Leu Glu Trp
275 280 285
CTG GCA GAT TAC GAT ATC AAC CAG GCC GAA AGC TGG AGC CGT AGC TTA 1512
Leu Ala Asp Tyr Asp Ile Asn Gln Ala Glu Ser Trp Ser Arg Ser Leu
290 295 300
GCA ACG CTG GCG GAA GAT GCG CTG GCG AAA CGT CCC GGC TTT CGT TCA 1560
Ala Thr Leu Ala Glu Asp Ala Leu Ala Lys Arg Pro Gly Phe Arg Ser
305 310 315 320
TTC CGC TGC CAG GAT TCC AGC CTG CTG GCC TTT GAT TTT GCT GGC GTT 1608
Phe Arg Cys Gln Asp Ser Ser Leu Leu Ala Phe Asp Phe Ala Gly Val
325 330 335
CAT CAT AGC GAT ATG GTG ACG CTG CTG GCG GAG TAC GGT ATT GCC CTG 1656
His His Ser Asp Met Val Thr Leu Leu Ala Glu Tyr Gly Ile Ala Leu
340 345 350
CGG GCC GGG CAG CAT TGC GCT CAG CCG CTA CTG GCA GAA TTA GGC GTA 1704
Arg Ala Gly Gln His Cys Ala Gln Pro Leu Leu Ala Glu Leu Gly Val
355 360 365
ACC GGC ACA CTG CGC GCC TCT TTT GCG CCA TAT AAT ACA AAG AGT GAT 1752
Thr Gly Thr Leu Arg Ala Ser Phe Ala Pro Tyr Asn Thr Lys Ser Asp
370 375 380
GTG GAT GCG CTG GTG AAT GCC GTT GAC CGC GCG CTG GAA TTA TTG GTG 1800
Val Asp Ala Leu Val Asn Ala Val Asp Arg Ala Leu Glu Leu Leu Val
385 390 395 400
GAT TAAACGCGTG CTAGAGGCAT CAAATAAAAC GAAAGGCTCA GTCGAAAGAC 1853
Asp
TGGGCCTTTC GTTTTATCTG TTGTTTGTCG GTGAACGCTC TCCTGAGTAG GACAAATCCG 1913
CCGCCCTAGA CCTAGGGGAT ATATTCCGCT TCCTCGCTCA CTGACTCGCT ACGCTCGGTC 1973
GTTCGACTGC GGCGAGCGGA AATGGCTTAC GAACGGGGCG GAGATTTCCT GGAAGATGCC 2033
AGGAAGATAC TTAACAGGGA AGTGAGAGGG CCGCGGCAAA GCCGTTTTTC CATAGGCTCC 2093
GCCCCCCTGA CAAGCATCAC GAAATCTGAC GCTCAAATCA GTGGTGGCGA AACCCGACAG 2153
GACTATAAAG ATACCAGGCG TTTCCCCCTG GCGGCTCCCT CGTGCGCTCT CCTGTTCCTG 2213
CCTTTCGGTT TACCGGTGTC ATTCCGCTGT TATGGCCGCG TTTGTCTCAT TCCACGCCTG 2273
ACACTCAGTT CCGGGTAGGC AGTTCGCTCC AAGCTGGACT GTATGCACGA ACCCCCCGTT 2333
CAGTCCGACC GCTGCGCCTT ATCCGGTAAC TATCGTCTTG AGTCCAACCC GGAAAGACAT 2393
GCAAAAGCAC CACTGGCAGC AGCCACTGGT AATTGATTTA GAGGAGTTAG TCTTGAAGTC 2453
ATGCGCCGGT TAAGGCTAAA CTGAAAGGAC AAGTTTTGGT GACTGCGCTC CTCCAAGCCA 2513
GTTACCTCGG TTCAAAGAGT TGGTAGGTCA GAGAACCTTC GAAAAACCGC CCTGCAAGGC 2573
GGTTTTTTCG TTTTCAGAGC AAGAGATTAC GCGCAGACCA AAACGATCTC AAGAAGATCA 2633
TCTTATTAAT CAGATAAAAT ATTTCTAGAT TTCAGTGCAA TTTATCTCTT CAAATGTAGC 2693
ACCTGAAGTC AGCCCCATAC GATATAAGTT GTTACTAGTG CTTGGATTCT CACCAATAAA 2753
AAACGCCCGG CGGCAACCGA GCGTTCTGAA CAAATCCAGA TGGAGTTCTG AGGTCATTAC 2813
TGGATCTATC AACAGGAGTC CAAGCGAGCT CTCGAACCCC AGAGTCCCGC TCAGAAGAAC 2873
TCGTCAAGAA GGCGATAGAA GGCGATGCGC TGCGAATCGG GAGCGGCGAT ACCGTAAAGC 2933
ACGAGGAAGC GGTCAGCCCA TTCGCCGCCA AGCTCTTCAG CAATATCACG GGTAGCCAAC 2993
GCTATGTCCT GATAGCGGTC CGCCACACCC AGCCGGCCAC AGTCGATGAA TCCAGAAAAG 3053
CGGCCATTTT CCACCATGAT ATTCGGCAAG CAGGCATCGC CATGGGTCAC GACGAGATCC 3113
TCGCCGTCGG GCATGCGCGC CTTGAGCCTG GCGAACAGTT CGGCTGGCGC GAGCCCCTGA 3173
TGCTCTTCGT CCAGATCATC CTGATCGACA AGACCGGCTT CCATCCGAGT ACGTGCTCGC 3233
TCGATGCGAT GTTTCGCTTG GTGGTCGAAT GGGCAGGTAG CCGGATCAAG CGTATGCAGC 3293
CGCCGCATTG CATCAGCCAT GATGGATACT TTCTCGGCAG GAGCAAGGTG AGATGACAGG 3353
AGATCCTGCC CCGGCACTTC GCCCAATAGC AGCCAGTCCC TTCCCGCTTC AGTGACAACG 3413
TCGAGCACAG CTGCGCAAGG AACGCCCGTC GTGGCCAGCC ACGATAGCCG CGCTGCCTCG 3473
TCCTGCAGTT CATTCAGGGC ACCGGACAGG TCGGTCTTGA CAAAAAGAAC CGGGCGCCCC 3533
TGCGCTGACA GCCGGAACAC GGCGGCATCA GAGCAGCCGA TTGTCTGTTG TGCCCAGTCA 3593
TAGCCGAATA GCCTCTCCAC CCAAGCGGCC GGAGAACCTG CGTGCAATCC ATCTTGTTCA 3653
ATCATGCGAA ACGATCCTCA TCCTGTCTCT TGATCAGATC TTGATCCCCT GCGCCATCAG 3713
ATCCTTGGCG GCAAGAAAGC CATCCAGTTT ACTTTGCAGG GCTTCCCAAC CTTACCAGAG 3773
GGCGCCCCAG CTGGCAATTC C 3794
(2)SEQ ID NO:6的信息:
(i)序列特征:
(A)长度:401个氨基酸
(B)类型:氨基酸
(D)拓扑结构:线性
(ii)分子类型:蛋白质
(xi)序列描述:SEQ ID NO:6:
Met Asn Val Phe Asn Pro Ala Gln Phe Arg Ala Gln Phe Pro Ala Leu
1 5 10 15
Gln Asp Ala Gly Val Tyr Leu Asp Ser Ala Ala Thr Ala Leu Lys Pro
20 25 30
Glu Ala Val Val Glu Ala Thr Gln Gln Phe Tyr Ser Leu Ser Ala Gly
35 40 45
Asn Val His Arg Ser Gln Phe Ala Glu Ala Gln Arg Leu Thr Ala Arg
50 55 60
Tyr Glu Ala Ala Arg Glu Lys Val Ala Gln Leu Leu Asn Ala Pro Asp
65 70 75 80
Asp Lys Thr Ile Val Trp Thr Arg Gly Thr Thr Glu Ser Ile Asn Met
85 90 95
Val Ala Gln Cys Tyr Ala Arg Pro Arg Leu Gln Pro Gly Asp Glu Ile
100 105 110
Ile Val Ser Val Ala Glu His His Ala Asn Leu Val Pro Trp Leu Met
115 120 125
Val Ala Gln Gln Thr Gly Ala Lys Val Val Lys Leu Pro Leu Asn Ala
130 135 140
Gln Arg Leu Pro Asp Val Asp Leu Leu Pro Glu Leu Ile Thr Pro Arg
145 150 155 160
Ser Arg Ile Leu Ala Leu Gly Gln Met Ser Asn Val Thr Gly Gly Cys
165 170 175
Pro Asp Leu Ala Arg Ala Ile Thr Phe Ala His Ser Ala Gly Met Val
180 185 190
Val Met Val Asp Gly Ala Gln Gly Ala Val His Phe Pro Ala Asp Val
195 200 205
Gln Gln Leu Asp Ile Asp Phe Tyr Ala Phe Ser Gly His Lys Leu Tyr
210 215 220
Gly Pro Thr Gly Ile Gly Val Leu Tyr Gly Lys Ser Glu Leu Leu Glu
225 230 235 240
Ala Met Ser Pro Trp Leu Gly Gly Gly Lys Met Val His Glu Val Ser
245 250 255
Phe Asp Gly Phe Thr Thr Gln Ser Ala Pro Trp Lys Leu Glu Ala Gly
260 265 270
Thr Pro Asn Val Ala Gly Val Ile Gly Leu Ser Ala Ala Leu Glu Trp
275 280 285
Leu Ala Asp Tyr Asp Ile Asn Gln Ala Glu Ser Trp Ser Arg Ser Leu
290 295 300
Ala Thr Leu Ala Glu Asp Ala Leu Ala Lys Arg Pro Gly Phe Arg Ser
305 310 315 320
Phe Arg Cys Gln Asp Ser Ser Leu Leu Ala Pne Asp Phe Ala Gly Val
325 330 335
His His Ser Asp Met Val Thr Leu Leu Ala Glu Tyr Gly Ile Ala Leu
340 345 350
Arg Ala Gly Gln His Cys Ala Gln pro Leu Leu Ala Glu Leu Gly Val
355 360 365
Thr Gly Thr Leu Arg Ala Ser Phe Ala Pro Tyr Asn Thr Lys Ser Asp
370 375 380
Val Asp Ala Leu Val Asn Ala Val Asp Arg Ala Leu Glu Leu Leu Val
385 390 395 400
Asp
(2)SEQ ID NO:7的信息:
(i)序列特征:
(A)长度:3810个碱基对
(B)类型:核酸
(C)链型:单链
(D)拓扑结构:环状
(ii)分子类型:DNA(基因组)
(iii)假拟结构:无
(iv)反义:无
(vii)直接来源:
(B)克隆:pHS1bioS2
(ix)特征:
(A)名称/关键词:CDS
(B)位置:608..1822
(xi)序列描述:SEQ ID NO:7:
GACGTCTGTG TGGAATTGTG AGCGGATAAC AATTTCACAC AGGGCCCTCG GACACCGAGG 60
AGAATGTCAA GAGGCGAACA CACAACGTCT TGGAGCGCCA GAGGAGGAAC GAGCTAAAAC 120
GGAGCTTTTT TGCCCTGCGT GACCAGATCC CGGAGTTGGA AAACAATGAA AAGGCCCCCA 180
AGGTAGTTAT CCTTAAAAAA GCCACAGCAT ACATCCTGTC CGTCCAAGCA GAGGAGCAAA 240
AGCTCATTTC TGAAGAGGAC TTGTTGCGGA AACGACGAGA ACAGTTGAAA CACAAACTTG 300
AACAGCTACG GAACTCTTGT GCGTAAGGAA AAGTAAGGAA AACGATTCCT TCTAACAGAA 360
ATGTCCTGAG CAATCACCTA TGAACTGTCG ACTCGAGATA GCATTTTTAT CCATAAGATT 420
AGCCGATCCT AAGGTTTACA ATTGTGAGCG CTCACAATTA TGATAGATTC AATTGTGAGC 480
GGATAACAAT TTCACACACG CTAGCGGTAC CGGGCCCCCC CTCGAGGTCG ACGGTATCGA 540
TAAGCTTGAT ATCGAATTCC TGCAGCCCGG GGGATCCCAT GGTACGCGTA AAGAGGAGAA 600
ATTAACT ATG AAA TTA CCG ATT TAT CTC GAC TAC TCC GCA ACC ACG CCG 649
Met Lys Leu Pro Ile Tyr Leu Asp Tyr Ser Ala Thr Thr Pro
1 5 10
GTG GAC CCG CGT GTT GCC GAG AAA ATG ATG CAG TTT ATG ACG ATG GAC 697
Val Asp Pro Arg Val Ala Glu Lys Met Met Gln Phe Met Thr Met Asp
15 20 25 30
GGA ACC TTT GGT AAC CCG GCC TCC CGT TCT CAC CGT TTC GGC TGG CAG 745
Gly Thr Phe Gly Asn Pro Ala Ser Arg Ser His Arg Phe Gly Trp Gln
35 40 45
GCT GAA GAA GCG GTA GAT ATC GCC CGT AAT CAG ATT GCC GAT CTG GTC 793
Ala Glu Glu Ala Val Asp Ile Ala Arg Asn Gln Ile Ala Asp Leu Val
50 55 60
GGC GCT GAT CCG CGT GAA ATC GTC TTT ACC TCT GGT GCA ACC GAA TCT 841
Gly Ala Asp Pro Arg Glu Ile Val Phe Thr Ser Gly Ala Thr Glu Ser
65 70 75
GAC AAC CTG GCG ATC AAA GGT GCA GCC AAC TTT TAT CAG AAA AAA GGC 889
Asp Asn Leu Ala Ile Lys Gly Ala Ala Asn Phe Tyr Gln Lys Lys Gly
80 85 90
AAG CAC ATC ATC ACC AGC AAA ACC GAA CAC AAA GCG GTA CTG GAT ACC 937
Lys His Ile Ile Thr Ser Lys Thr Glu His Lys Ala Val Leu Asp Thr
95 100 105 110
TGC CGT CAG CTG GAG CGC GAA GGT TTT GAA GTC ACC TAC CTG GCA CCG 985
Cys Arg Gln Leu Glu Arg Glu Gly Phe Glu Val Thr Tyr Leu Ala Pro
115 120 125
CAG CGT AAC GGC ATT ATC GAC CTG AAA GAA CTT GAA GCA GCG ATG CGT 1033
Gln Arg Asn Gly Ile Ile Asp Leu Lys Glu Leu Glu Ala Ala Met Arg
130 135 140
GAC GAC ACC ATC CTC GTG TCC ATC ATG CAC GTA AAT AAC GAA ATC GGC 1081
Asp Asp Thr Ile Leu Val Ser Ile Met His Val Asn Asn Glu Ile Gly
145 150 155
GTG GTG CAG GAT ATC GCG GCT ATC GGC GAA ATG TGC CGT GCT CGT GGC 1129
Val Val Gln Asp Ile Ala Ala Ile Gly Glu Met Cys Arg Ala Arg Gly
160 165 170
ATT ATC TAT CAC GTT GAT GCA ACC CAG AGC GTG GGT AAA CTG CCT ATC 1177
Ile Ile Tyr His Val Asp Ala Thr Gln Ser Val Gly Lys Leu Pro Ile
175 180 185 190
GAC CTG AGC CAG TTG AAA GTT GAC CTG ATG TCT TTC TCC GGT CAC AAA 1225
Asp Leu Ser Gln Leu Lys Val Asp Leu Met Ser Phe Ser Gly His Lys
195 200 205
ATC TAT GGC CCG AAA GGT ATC GGT GCG CTG TAT GTA CGT CGT AAA CCG 1273
Ile Tyr Gly Pro Lys Gly Ile Gly Ala Leu Tyr Val Arg Arg Lys Pro
210 215 220
CGC GTA CGC ATC GAA GCG CAA ATG CAC GGC GGC GGT CAC GAG CGC GGT 1321
Arg Val Arg Ile Glu Ala Gln Met His Gly Gly Gly His Glu Arg Gly
225 230 235
ATG CGT TCC GGC ACT CTG CCT GTT CAC CAG ATC GTC GGA ATG GGC GAG 1369
Met Arg Ser Gly Thr Leu Pro Val His Gln Ile Val Gly Met Gly Glu
240 245 250
GCC TAT CGC ATC GCA AAA GAA GAG ATG GCG ACC GAG ATG GAA CGT CTG 1417
Ala Tyr Arg Ile Ala Lys Glu Glu Met Ala Thr Glu Met Glu Arg Leu
255 260 265 270
CGC GGC CTG CGT AAC CGT CTG TGG AAC GGC ATC AAA GAT ATC GAA GAA 1465
Arg Gly Leu Arg Asn Arg Leu Trp Asn Gly Ile Lys Asp Ile Glu Glu
275 280 285
GTT TAC CTG AAC GGT GAC CTG GAA CAC GGT GCG CCG AAC ATT CTC AAC 1513
Val Tyr Leu Asn Gly Asp Leu Glu His Gly Ala Pro Asn Ile Leu Asn
290 295 300
GTC AGC TTC AAC TAC GTT GAA GGT GAG TCG CTG ATT ATG GCG CTG AAA 1561
Val Ser Phe Asn Tyr Val Glu Gly Glu Ser Leu Ile Met Ala Leu Lys
305 310 315
GAC CTC GCA GTT TCT TCA GGT TCC GCC TGT ACG TCA GCA AGC CTC GAA 1609
Asp Leu Ala Val Ser Ser Gly Ser Ala Cys Thr Ser Ala Ser Leu Glu
320 325 330
CCG TCC TAC GTG CTG CGC GCG CTG GGG CTG AAC GAC GAG CTG GCA CAT 1657
Pro Ser Tyr Val Leu Arg Ala Leu Gly Leu Asn Asp Glu Leu Ala His
335 340 345 350
AGC TCT ATC CGT TTC TCT TTA GGT CGT TTT ACT ACT GAA GAA GAG ATC 1705
Ser Ser Ile Arg Phe Ser Leu Gly Arg Phe Thr Thr Glu Glu Glu Ile
355 360 365
GAC TAC ACC ATC GAG TTA GTT CGT AAA TCC ATC GGT CGT CTG CGT GAC 1753
Asp Tyr Thr Ile Glu Leu Val Arg Lys Ser Ile Gly Arg Leu Arg Asp
370 375 380
CTT TCT CCG CTG TGG GAA ATG TAC AAG CAG GGC GTG GAT CTG AAC AGC 1801
Leu Ser Pro Leu Trp Glu Met Tyr Lys Gln Gly Val Asp Leu Asn Ser
385 390 395
ATC GAA TGG GCT CAT CAT TAAACGCGTG CTAGAGGCAT CAAATAAAAC 1849
Ile Glu Trp Ala His His
400 405
GAAAGGCTCA GTCGAAAGAC TGGGCCTTTC GTTTTATCTG TTGTTTGTCG GTGAACGCTC 1909
TCCTGAGTAG GACAAATCCG CCGCCCTAGA CCTAGGGGAT ATATTCCGCT TCCTCGCTCA 1969
CTGACTCGCT ACGCTCGGTC GTTCGACTGC GGCGAGCGGA AATGGCTTAC GAACGGGGCG 2029
GAGATTTCCT GGAAGATGCC AGGAAGATAC TTAACAGGGA AGTGAGAGGG CCGCGGCAAA 2089
GCCGTTTTTC CATAGGCTCC GCCCCCCTGA CAAGCATCAC GAAATCTGAC GCTCAAATCA 2149
GTGGTGGCGA AACCCGACAG GACTATAAAG ATACCAGGCG TTTCCCCCTG GCGGCTCCCT 2209
CGTGCGCTCT CCTGTTCCTG CCTTTCGGTT TACCGGTGTC ATTCCGCTGT TATGGCCGCG 2269
TTTGTCTCAT TCCACGCCTG ACACTCAGTT CCGGGTAGGC AGTTCGCTCC AAGCTGGACT 2329
GTATGCACGA ACCCCCCGTT CAGTCCGACC GCTGCGCCTT ATCCGGTAAC TATCGTCTTG 2389
AGTCCAACCC GGAAAGACAT GCAAAAGCAC CACTGGCAGC AGCCACTGGT AATTGATTTA 2449
GAGGAGTTAG TCTTGAAGTC ATGCGCCGGT TAAGGCTAAA CTGAAAGGAC AAGTTTTGGT 2509
GACTGCGCTC CTCCAAGCCA GTTACCTCGG TTCAAAGAGT TGGTAGCTCA GAGAACCTTC 2569
GAAAAACCGC CCTGCAAGGC GGTTTTTTCG TTTTCAGAGC AAGAGATTAC GCGCAGACCA 2629
AAACGATCTC AAGAAGATCA TCTTATTAAT CAGATAAAAT ATTTCTAGAT TTCAGTGCAA 2689
TTTATCTCTT CAAATGTAGC ACCTGAAGTC AGCCCCATAC GATATAAGTT GTTACTAGTG 2749
CTTGGATTCT CACCAATAAA AAACGCCCGG CGGCAACCGA GCGTTCTGAA CAAATCCAGA 2809
TGGAGTTCTG AGGTCATTAC TGGATCTATC AACAGGAGTC CAAGCGAGCT CTCGAACCCC 2869
AGAGTCCCGC TCAGAAGAAC TCGTCAAGAA GGCGATAGAA GGCGATGCGC TGCGAATCGG 2929
GAGCGGCGAT ACCGTAAAGC ACGAGGAAGC GGTCAGCCCA TTCGCCGCCA AGCTCTTCAG 2989
CAATATCACG GGTAGCCAAC GCTATGTCCT GATAGCGGTC CGCCACACCC AGCCGGCCAC 3049
AGTCGATGAA TCCAGAAAAG CGGCCATTTT CCACCATGAT ATTCGGCAAG CAGGCATCGC 3109
CATGGGTCAC GACGAGATCC TCGCCGTCGG GCATGCGCGC CTTGAGCCTG GCGAACAGTT 3169
CGGCTGGCGC GAGCCCCTGA TGCTCTTCGT CCAGATCATC CTGATCGACA AGACCGGCTT 3229
CCATCCGAGT ACGTGCTCGC TCGATGCGAT GTTTCGCTTG GTGGTCGAAT GGGCAGGTAG 3289
CCGGATCAAG CGTATGCAGC CGCCGCATTG CATCAGCCAT GATGGATACT TTCTCGGCAG 3349
GAGCAAGGTG AGATGACAGG AGATCCTGCC CCGGCACTTC GCCCAATAGC AGCCAGTCCC 3409
TTCCCGCTTC AGTGACAACG TCGAGCACAG CTGCGCAAGG AACGCCCGTC GTGGCCAGCC 3469
ACGATAGCCG CGCTGCCTCG TCCTGCAGTT CATTCAGGGC ACCGGACAGG TCGGTCTTGA 3529
CAAAAAGAAC CGGGCGCCCC TGCGCTGACA GCCGGAACAC GGCGGCATCA GAGCAGCCGA 3589
TTGTCTGTTG TGCCCAGTCA TAGCCGAATA GCCTCTCCAC CCAAGCGGCC GGAGAACCTG 3649
CGTGCAATCC ATCTTGTTCA ATCATGCGAA ACGATCCTCA TCCTGTCTCT TGATCAGATC 3709
TTGATCCCCT GCGCCATCAG ATCCTTGGCG GCAAGAAAGC CATCCAGTTT ACTTTGCAGG 3769
GCTTCCCAAC CTTACCAGAG GGCGCCCCAG CTGGCAATTC C 3810
(2)SEQ ID NO:8的信息:
(i)序列特征:
(A)长度:404个氨基酸
(B)类型:氨基酸
(D)拓扑结构:线性
(ii)分子类型:蛋白质
(xi)序列描述:SEQ ID NO:8:
Met Lys Leu Pro Ile Tyr Leu Asp Tyr Ser Ala Thr Thr Pro Val Asp
1 5 10 15
Pro Arg Val Ala Glu Lys Met Met Gln Phe Met Thr Met Asp Gly Thr
20 25 30
Phe Gly Asn Pro Ala Ser Arg Ser His Arg Phe Gly Trp Gln Ala Glu
35 40 45
Glu Ala Val Asp Ile Ala Arg Asn Gln Ile Ala Asp Leu Val Gly Ala
50 55 60
Asp Pro Arg Glu Ile Val phe Thr Ser Gly Ala Thr Glu Ser Asp Asn
65 70 75 80
Leu Ala Ile Lys Gly Ala Ala Asn Phe Tyr Gln Lys Lys Gly Lys His
85 90 95
Ile Ile Thr Ser Lys Thr Glu His Lys Ala Val Leu Asp Thr Cys Arg
100 105 110
Gln Leu Glu Arg Glu Gly phe Glu Val Thr Tyr Leu Ala Pro Gln Arg
115 120 125
Asn Gly Ile Ile Asp Leu Lys Glu Leu Glu Ala Ala Met Arg Asp Asp
130 135 140
Thr Ile Leu Val Ser Ile Met His Val Asn Asn Glu Ile Gly Val Val
145 150 155 160
Gln Asp Ile Ala Ala Ile Gly Glu Met Cys Arg Ala Arg Gly Ile Ile
165 170 175
Tyr His Val Asp Ala Thr Gln Ser Val Gly Lys Leu Pro Ile Asp Leu
180 185 190
Ser Gln Leu Lys Val Asp Leu Met Ser Phe Ser Gly His Lys Ile Tyr
195 200 205
Gly Pro Lys Gly Ile Gly Ala Leu Tyr Val Arg Arg Lys Pro Arg Val
210 215 220
Arg Ile Glu Ala Gln Met His Gly Gly Gly His Glu Arg Gly Met Arg
225 230 235 240
Ser Gly Thr Leu Pro Val His Gln Ile Val Gly Met Gly Glu Ala Tyr
245 250 255
Arg Ile Ala Lys Glu Glu Met Ala Thr Glu Met Glu Arg Leu Arg Gly
260 265 270
Leu Arg Asn Arg Leu Trp Asn Gly Ile Lys Asp Ile Glu Glu Val Tyr
275 280 285
Leu Asn Gly Asp Leu Glu His Gly Ala Pro Asn Ile Leu Asn Val Ser
290 295 300
Phe Asn Tyr Val Glu Gly Glu Ser Leu Ile Met Ala Leu Lys Asp Leu
305 310 315 320
Ala Val Ser Ser Gly Ser Ala Cys Thr Ser Ala Ser Leu Glu Pro Ser
325 330 335
Tyr Val Leu Arg Ala Leu Gly Leu Asn Asp Glu Leu Ala His Ser Ser
340 345 350
Ile Arg Phe Ser Leu Gly Arg Phe Thr Thr Glu Glu Glu Ile Asp Tyr
355 360 365
Thr Ile Glu Leu Val Arg Lys Ser Ile Gly Arg Leu Arg Asp Leu Ser
370 375 380
Pro Leu Trp Glu Met Tyr Lys Gln Gly Val Asp Leu Asn Ser Ile Glu
385 390 395 400
Trp Ala His His
(2)SEQ ID NO:9的信息:
(i)序列特征:
(A)长度:3465个碱基对
(B)类型:核酸
(C)链型:单链
(D)拓扑结构:环状
(ii)分子类型:DNA(基因组)
(iii)假拟结构:无
(iv)反义:无
(vii)直接来源:
(B)克隆:pHS2bioSl
(ix)特征:
(A)名称/关键词:CDS
(B)位置:272..1477
(xi)序列描述:SEQ ID NO:9:
GACGTCTAAG AAACCATTAT TATCATGACA TTAACCTATA AAAATAGGCG TATCACGAGG 60
CCCTTTCGTC TTCACCTCGA GTCCCTATCA GTGATAGAGA TTGACATCCC TATCAGTGAT 120
AGAGATACTG AGCACATCAG CAGGACGCAC TGACCGAATT CATTAAAGAG GAGAAAGGTA 180
CCGGGCCCCC CCTCGAGGTC GACGGTATCG ATAAGCTTGA TATCGAATTC CTGCAGCCCG 240
GGGGATCCCA TGGTACGCGT CGAGGAGTAC C ATG AAC GTT TTT AAT CCC GCG 292
Met Asn Val Phe Asn Pro Ala
1 5
CAG TTT CGC GCC CAG TTT CCC GCA CTA CAG GAT GCG GGC GTC TAT CTC 340
Gln Phe Arg Ala Gln Phe Pro Ala Leu Gln Asp Ala Gly Val Tyr Leu
10 15 20
GAC AGC GCC GCG ACC GCG CTT AAA CCT GAA GCC GTG GTT GAA GCC ACC 388
Asp Ser Ala Ala Thr Ala Leu Lys Pro Glu Ala Val Val Glu Ala Thr
25 30 35
CAA CAG TTT TAC AGT CTG AGC GCC GGA AAC GTC CAT CGC AGC CAG TTT 436
Gln Gln phe Tyr Ser Leu Ser Ala Gly Asn Val His Arg Ser Gln Phe
40 45 50 55
GCC GAA GCC CAA CGC CTG ACC GCG CGT TAT GAA GCT GCA CGA GAG AAA 484
Ala Glu Ala Gln Arg Leu Thr Ala Arg Tyr Glu Ala Ala Arg Glu Lys
60 65 70
GTG GCG CAA TTA CTG AAT GCA CCG GAT GAT AAA ACT ATC GTC TGG ACG 532
Val Ala Gln Leu Leu Asn Ala Pro Asp Asp Lys Thr Ile Val Trp Thr
75 80 85
CGC GGC ACC ACT GAA TCC ATC AAC ATG GTG GCA CAA TGC TAT GCG CGT 580
Arg Gly Thr Thr Glu Ser Ile Asn Met Val Ala Gln Cys Tyr Ala Arg
90 95 100
CCG CGT CTG CAA CCG GGC GAT GAG ATT ATT GTC AGC GTG GCA GAA CAC 628
Pro Arg Leu Gln Pro Gly Asp Glu Ile Ile Val Ser Val Ala Glu His
105 110 115
CAC GCC AAC CTC GTC CCC TGG CTG ATG GTC GCC CAA CAA ACT GGA GCC 676
His Ala Asn Leu Val Pro Trp Leu Met Val Ala Gln Gln Thr Gly Ala
120 125 130 135
AAA GTG GTG AAA TTG CCG CTT AAT GCG CAG CGA CTG CCG GAT GTC GAT 724
Lys Val Val Lys Leu Pro Leu Asn Ala Gln Arg Leu Pro Asp Val Asp
140 145 150
TTG TTG CCA GAA CTG ATT ACT CCC CGT AGT CGG ATT CTG GCG TTG GGT 772
Leu Leu Pro Glu Leu Ile Thr Pro Arg Ser Arg Ile Leu Ala Leu Gly
155 160 165
CAG ATG TCG AAC GTT ACT GGC GGT TGC CCG GAT CTG GCG CGA GCG ATT 820
Gln Met Ser Asn Val Thr Gly Gly Cys Pro Asp Leu Ala Arg Ala Ile
170 175 180
ACC TTT GCT CAT TCA GCC GGG ATG GTG GTG ATG GTT GAT GGT GCT CAG 868
Thr Phe Ala His Ser Ala Gly Met Val Val Met Val Asp Gly Ala Gln
185 190 195
GGG GCA GTG CAT TTC CCC GCG GAT GTT CAG CAA CTG GAT ATT GAT TTC 916
Gly Ala Val His Phe Pro Ala Asp Val Gln Gln Leu Asp Ile Asp Phe
200 205 210 215
TAT GCT TTT TCA GGT CAC AAA CTG TAT GGC CCG ACA GGT ATC GGC GTG 964
Tyr Ala Phe Ser Gly His Lys Leu Tyr Gly Pro Thr Gly Ile Gly Val
220 225 230
CTG TAT GGT AAA TCA GAA CTG CTG GAG GCG ATG TCG CCC TGG CTG GGC 1012
Leu Tyr Gly Lys Ser Glu Leu Leu Glu Ala Met Ser Pro Trp Leu Gly
235 240 245
GGC GGC AAA ATG GTT CAC GAA GTG AGT TTT GAC GGC TTC ACG ACT CAA 1060
Gly Gly Lys Met Val His Glu Val Ser Phe Asp Gly Phe Thr Thr Gln
250 255 260
TCT GCG CCG TGG AAA CTG GAA GCT GGA ACG CCA AAT GTC GCT GGT GTC 1108
Ser Ala Pro Trp Lys Leu Glu Ala Gly Thr Pro Asn Val Ala Gly Val
265 270 275
ATA GGA TTA AGC GCG GCG CTG GAA TGG CTG GCA GAT TAC GAT ATC AAC 1156
Ile Gly Leu Ser Ala Ala Leu Glu Trp Leu Ala Asp Tyr Asp Ile Asn
280 285 290 295
CAG GCC GAA AGC TGG AGC CGT AGC TTA GCA ACG CTG GCG GAA GAT GCG 1204
Gln Ala Glu Ser Trp Ser Arg Ser Leu Ala Thr Leu Ala Glu Asp Ala
300 305 310
CTG GCG AAA CGT CCC GGC TTT CGT TCA TTC CGC TGC CAG GAT TCC AGC 1252
Leu Ala Lys Arg Pro Gly Phe Arg Ser Phe Arg Cys Gln Asp Ser Ser
315 320 325
CTG CTG GCC TTT GAT TTT GCT GGC GTT CAT CAT AGC GAT ATG GTG ACG 1300
Leu Leu Ala Phe Asp Phe Ala Gly Val His His Ser Asp Met Val Thr
330 335 340
CTG CTG GCG GAG TAC GGT ATT GCC CTG CGG GCC GGG CAG CAT TGC GCT 1348
Leu Leu Ala Glu Tyr Gly Ile Ala Leu Arg Ala Gly Gln His Cys Ala
345 350 355
CAG CCG CTA CTG GCA GAA TTA GGC GTA ACC GGC ACA CTG CGC GCC TCT 1396
Gln Pro Leu Leu Ala Glu Leu Gly Val Thr Gly Thr Leu Arg Ala Ser
360 365 370 375
TTT GCG CCA TAT AAT ACA AAG AGT GAT GTG GAT GCG CTG GTG AAT GCC 1444
Phe Ala Pro Tyr Asn Thr Lys Ser Asp Val Asp Ala Leu Val Asn Ala
380 385 390
GTT GAC CGC GCG CTG GAA TTA TTG GTG GAT TAAACGCGTG CTAGAGGCAT 1494
Val Asp Arg Ala Leu Glu Leu Leu Val Asp
395 400
CAAATAAAAC GAAAGGCTCA GTCGAAAGAC TGGGCCTTTC GTTTTATCTG TTGTTTGTCG 1554
GTGAACGCTC TCCTGAGTAG GACAAATCCG CCGCCCTAGA CCTAGGGGAT ATATTCCGCT 1614
TCCTCGCTCA CTGACTCGCT ACGCTCGGTC GTTCGACTGC GGCGAGCGGA AATGGCTTAC 1674
GAACGGGGCG GAGATTTCCT GGAAGATGCC AGGAAGATAC TTAACAGGGA AGTGAGAGGG 1734
CCGCGGCAAA GCCGTTTTTC CATAGGCTCC GCCCCCCTGA CAAGCATCAC GAAATCTGAC 1794
GCTCAAATCA GTGGTGGCGA AACCCGACAG GACTATAAAG ATACCAGGCG TTTCCCCCTG 1854
GCGGCTCCCT CGTGCGCTCT CCTGTTCCTG CCTTTCGGTT TACCGGTGTC ATTCCGCTGT 1914
TATGGCCGCG TTTGTCTCAT TCCACGCCTG ACACTCAGTT CCGGGTAGGC AGTTCGCTCC 1974
AAGCTGGACT GTATGCACGA ACCCCCCGTT CAGTCCGACC GCTGCGCCTT ATCCGGTAAC 2034
TATCGTCTTG AGTCCAACCC GGAAAGACAT GCAAAAGCAC CACTGGCAGC AGCCACTGGT 2094
AATTGATTTA GAGGAGTTAG TCTTGAAGTC ATGCGCCGGT TAAGGCTAAA CTGAAAGGAC 2154
AAGTTTTGGT GACTGCGCTC CTCCAAGCCA GTTACCTCGG TTCAAAGAGT TGGTAGCTCA 2214
GAGAACCTTC GAAAAACCGC CCTGCAAGGC GGTTTTTTCG TTTTCAGAGC AAGAGATTAC 2274
GCGCAGACCA AAACGATCTC AAGAAGATCA TCTTATTAAT CAGATAAAAT ATTTCTAGAT 2334
TTCAGTGCAA TTTATCTCTT CAAATGTAGC ACCTGAAGTC AGCCCCATAC GATATAAGTT 2394
GTTACTAGTG CTTGGATTCT CACCAATAAA AAACGCCCGG CGGCAACCGA GCGTTCTGAA 2454
CAAATCCAGA TGGAGTTCTG AGGTCATTAC TGGATCTATC AACAGGAGTC CAAGCGAGCT 2514
CTCGAACCCC AGAGTCCCGC TCAGAAGAAC TCGTCAAGAA GGCGATAGAA GGCGATGCGC 2574
TGCGAATCGG GAGCGGCGAT ACCGTAAAGC ACGAGGAAGC GGTCAGCCCA TTCGCCGCCA 2634
AGCTCTTCAG CAATATCACG GGTAGCCAAC GCTATGTCCT GATAGCGGTC CGCCACACCC 2694
AGCCGGCCAC AGTCGATGAA TCCAGAAAAG CGGCCATTTT CCACCATGAT ATTCGGCAAG 2754
CAGGCATCGC CATGGGTCAC GACGAGATCC TCGCCGTCGG GCATGCGCGC CTTGAGCCTG 2814
GCGAACAGTT CGGCTGGCGC GAGCCCCTGA TGCTCTTCGT CCAGATCATC CTGATCGACA 2874
AGACCGGCTT CCATCCGAGT ACGTGCTCGC TCGATGCGAT GTTTCGCTTG GTGGTCGAAT 2934
GGGCAGGTAG CCGGATCAAG CGTATGCAGC CGCCGCATTG CATCAGCCAT GATGGATACT 2994
TTCTCGGCAG GAGCAAGGTG AGATGACAGG AGATCCTGCC CCGGCACTTC GCCCAATAGC 3054
AGCCAGTCCC TTCCCGCTTC AGTGACAACG TCGAGCACAG CTGCGCAAGG AACGCCCGTC 3114
GTGGCCAGCC ACGATAGCCG CGCTGCCTCG TCCTGCAGTT CATTCAGGGC ACCGGACAGG 3174
TCGGTCTTGA CAAAAAGAAC CGGGCGCCCC TGCGCTGACA GCCGGAACAC GGCGGCATCA 3234
GAGCAGCCGA TTGTCTGTTG TGCCCAGTCA TAGCCGAATA GCCTCTCCAC CCAAGCGGCC 3294
GGAGAACCTG CGTGCAATCC ATCTTGTTCA ATCATGCGAA ACGATCCTCA TCCTGTCTCT 3354
TGATCAGATC TTGATCCCCT GCGCCATCAG ATCCTTGGCG GCAAGAAAGC CATCCAGTTT 3414
ACTTTGCAGG GCTTCCCAAC CTTACCAGAG GGCGCCCCAG CTGGCAATTC C 3465
(2)SEQ ID NO:10的信息:
(i)序列特征:
(A)长度:401个氨基酸
(B)类型:氨基酸
(D)拓扑结构:线性
(ii)分子类型:蛋白质
(xi)序列描述:SEQ ID NO:10:
Met Asn Val Phe Asn Pro Ala Gln Phe Arg Ala Gln Phe Pro Ala Leu
1 5 10 15
Gln Asp Ala Gly Val Tyr Leu Asp Ser Ala Ala Thr Ala Leu Lys Pro
20 25 30
Glu Ala Val Val Glu Ala Thr Gln Gln Phe Tyr Ser Leu Ser Ala Gly
35 40 45
Asn Val His Arg Ser Gln Phe Ala Glu Ala Gln Arg Leu Thr Ala Arg
50 55 60
Tyr Glu Ala Ala Arg Glu Lys Val Ala Gln Leu Leu Asn Ala Pro Asp
65 70 75 80
Asp Lys Thr Ile Val Trp Thr Arg Gly Thr Thr Glu Ser Ile Asn Met
85 90 95
Val Ala Gln Cys Tyr Ala Arg Pro Arg Leu Gln Pro Gly Asp Glu Ile
100 105 110
Ile Val Ser Val Ala Glu His His Ala Asn Leu Val Pro Trp Leu Met
115 120 125
Val Ala Gln Gln Thr Gly Ala Lys Val Val Lys Leu Pro Leu Asn Ala
130 135 140
Gln Arg Leu Pro Asp Val Asp Leu Leu Pro Glu Leu Ile Thr Pro Arg
145 150 155 160
Ser Arg Ile Leu Ala Leu Gly Gln Met Ser Asn Val Thr Gly Gly Cys
165 170 175
Pro Asp Leu Ala Arg Ala Ile Thr Phe Ala His Ser Ala Gly Met Val
180 185 190
Val Met Val Asp Gly Ala Gln Gly Ala Val His Phe Pro Ala Asp Val
195 200 205
Gln Gln Leu Asp Ile Asp Phe Tyr Ala Phe Ser Gly His Lys Leu Tyr
210 215 220
Gly Pro Thr Gly Ile Gly Val Leu Tyr Gly Lys Ser Glu Leu Leu Glu
225 230 235 240
Ala Met Ser Pro Trp Leu Gly Gly Gly Lys Met Val His Glu Val Ser
245 250 255
Phe Asp Gly Phe Thr Thr Gln Ser Ala Pro Trp Lys Leu Glu Ala Gly
260 265 270
Thr Pro Asn Val Ala Gly Val Ile Gly Leu Ser Ala Ala Leu Glu Trp
275 280 285
Leu Ala Asp Tyr Asp Ile Asn Gln Ala Glu Ser Trp Ser Arg Ser Leu
290 295 300
Ala Thr Leu Ala Glu Asp Ala Leu Ala Lys Arg Pro Gly Phe Arg Ser
305 310 315 320
Phe Arg Cys Gln Asp Ser Ser Leu Leu Ala Phe Asp Phe Ala Gly Val
325 330 335
His His Ser Asp Met Val Thr Leu Leu Ala Glu Tyr Gly Ile Ala Leu
340 345 350
Arg Ala Gly Gln His Cys Ala Gln Pro Leu Leu Ala Glu Leu Gly Val
355 360 365
Thr Gly Thr Leu Arg Ala Ser Phe Ala Pro Tyr Asn Thr Lys Ser Asp
370 375 380
Val Asp Ala Leu Val Asn Ala Val Asp Arg Ala Leu Glu Leu Leu Val
385 390 395 400
Asp
(2)SEQ ID NO:11的信息:
(i)序列特征:
(A)长度:3481个碱基对
(B)类型:核酸
(C)链型:单链
(D)拓扑结构:环状
(ii)分子类型:DNA(基因组)
(iii)假拟结构:无
(iv)反义:无
(vi)直接来源:
(B)克隆:pHS2bioS2
(ix)特征:
(A)名称/关键词:CDS
(B)位置:279..1493
(xi)序列描述:SEQ ID NO:11:
GACGTCTAAG AAACCATTAT TATCATGACA TTAACCTATA AAAATAGGCG TATCACGAGG 60
CCCTTTCGTC TTCACCTCGA GTCCCTATCA GTGATAGAGA TTGACATCCC TATCAGTGAT 120
AGAGATACTG AGCACATCAG CAGGACGCAC TGACCGAATT CATTAAAGAG GAGAAAGGTA 180
CCGGGCCCCC CCTCGAGGTC GACGGTATCG ATAAGCTTGA TATCGAATTC CTGCAGCCCG 240
GGGGATCCCA TGGTACGCGT AAAGAGGAGA AATTAACT ATG AAA TTA CCG ATT 293
Met Lys Leu Pro Ile
1 5
TAT CTC GAC TAC TCC GCA ACC ACG CCG GTG GAC CCG CGT GTT GCC GAG 341
Tyr Leu Asp Tyr Ser Ala Thr Thr Pro Val Asp Pro Arg Val Ala Glu
10 15 20
AAA ATG ATG CAG TTT ATG ACG ATG GAC GGA ACC TTT GGT AAC CCG GCC 389
Lys Met Met Gln Phe Met Thr Met Asp Gly Thr Phe Gly Asn Pro Ala
25 30 35
TCC CGT TCT CAC CGT TTC GGC TGG CAG GCT GAA GAA GCG GTA GAT ATC 437
Ser Arg Ser His Arg Phe Gly Trp Gln Ala Glu Glu Ala Val Asp Ile
40 45 50
GCC CGT AAT CAG ATT GCC GAT CTG GTC GGC GCT GAT CCG CGT GAA ATC 485
Ala Arg Asn Gln Ile Ala Asp Leu Val Gly Ala Asp Pro Arg Glu Ile
55 60 65
GTC TTT ACC TCT GGT GCA ACC GAA TCT GAC AAC CTG GCG ATC AAA GGT 533
Val Phe Thr Ser Gly Ala Thr Glu Ser Asp Asn Leu Ala Ile Lys Gly
70 75 80 85
GCA GCC AAC TTT TAT CAG AAA AAA GGC AAG CAC ATC ATC ACC AGC AAA 581
Ala Ala Asn Phe Tyr Gln Lys Lys Gly Lys His Ile Ile Thr Ser Lys
90 95 100
ACC GAA CAC AAA GCG GTA CTG GAT ACC TGC CGT CAG CTG GAG CGC GAA 629
Thr Glu His Lys Ala Val Leu Asp Thr Cys Arg Gln Leu Glu Arg Glu
105 110 115
GGT TTT GAA GTC ACC TAC CTG GCA CCG CAG CGT AAC GGC ATT ATC GAC 677
Gly Phe Glu Val Thr Tyr Leu Ala Pro Gln Arg Asn Gly Ile Ile Asp
120 125 130
CTG AAA GAA CTT GAA GCA GCG ATG CGT GAC GAC ACC ATC CTC GTG TCC 725
Leu Lys Glu Leu Glu Ala Ala Met Arg Asp Asp Thr Ile Leu Val Ser
135 140 145
ATC ATG CAC GTA AAT AAC GAA ATC GGC GTG GTG CAG GAT ATC GCG GCT 773
Ile Met His Val Asn Asn Glu Ile Gly Val Val Gln Asp Ile Ala Ala
150 155 160 165
ATC GGC GAA ATG TGC CGT GCT CGT GGC ATT ATC TAT CAC GTT GAT GCA 821
Ile Gly Glu Met Cys Arg Ala Arg Gly Ile Ile Tyr His Val Asp Ala
170 175 180
ACC CAG AGC GTG GGT AAA CTG CCT ATC GAC CTG AGC CAG TTG AAA GTT 869
Thr Gln Ser Val Gly Lys Leu Pro Ile Asp Leu Ser Gln Leu Lys Val
185 190 195
GAC CTG ATG TCT TTC TCC GGT CAC AAA ATC TAT GGC CCG AAA GGT ATC 917
Asp Leu Met Ser Phe Ser Gly His Lys Ile Tyr Gly Pro Lys Gly Ile
200 205 210
GGT GCG CTG TAT GTA CGT CGT AAA CCG CGC GTA CGC ATC GAA GCG CAA 965
Gly Ala Leu Tyr Val Arg Arg Lys Pro Arg Val Arg Ile Glu Ala Gln
215 220 225
ATG CAC GGC GGC GGT CAC GAG CGC GGT ATG CGT TCC GGC ACT CTG CCT 1013
Met His Gly Gly Gly His Glu Arg Gly Met Arg Ser Gly Thr Leu Pro
230 235 240 245
GTT CAC CAG ATC GTC GGA ATG GGC GAG GCC TAT CGC ATC GCA AAA GAA 1061
Val His Gln Ile Val Gly Met Gly Glu Ala Tyr Arg Ile Ala Lys Glu
250 255 260
GAG ATG GCG ACC GAG ATG GAA CGT CTG CGC GGC CTG CGT AAC CGT CTG 1109
Glu Met Ala Thr Glu Met Glu Arg Leu Arg Gly Leu Arg Asn Arg Leu
265 270 275
TGG AAC GGC ATC AAA GAT ATC GAA GAA GTT TAC CTG AAC GGT GAC CTG 1157
Trp Asn Gly Ile Lys Asp Ile Glu Glu Val Tyr Leu Asn Gly Asp Leu
280 285 290
GAA CAC GGT GCG CCG AAC ATT CTC AAC GTC AGC TTC AAC TAC GTT GAA 1205
Glu His Gly Ala Pro Asn Ile Leu Asn Val Ser Phe Asn Tyr Val Glu
295 300 305
GGT GAG TCG CTG ATT ATG GCG CTG AAA GAC CTC GCA GTT TCT TCA GGT 1253
Gly Glu Ser Leu Ile Met Ala Leu Lys Asp Leu Ala Val Ser Ser Gly
310 315 320 325
TCC GCC TGT ACG TCA GCA AGC CTC GAA CCG TCC TAC GTG CTG CGC GCG 1301
Ser Ala Cys Thr Ser Ala Ser Leu Glu Pro Ser Tyr Val Leu Arg Ala
330 335 340
CTG GGG CTG AAC GAC GAG CTG GCA CAT AGC TCT ATC CGT TTC TCT TTA 1349
Leu Gly Leu Asn Asp Glu Leu Ala His Ser Ser Ile Arg Phe Ser Leu
345 350 355
GGT CGT TTT ACT ACT GAA GAA GAG ATC GAC TAC ACC ATC GAG TTA GTT 1397
Gly Arg Phe Thr Thr Glu Glu Glu Ile Asp Tyr Thr Ile Glu Leu Val
360 365 370
CGT AAA TCC ATC GGT CGT CTG CGT GAC CTT TCT CCG CTG TGG GAA ATG 1445
Arg Lys Ser Ile Gly Arg Leu Arg Asp Leu Ser Pro Leu Trp Glu Met
375 380 385
TAC AAG CAG GGC GTG GAT CTG AAC AGC ATC GAA TGG GCT CAT CAT TAAACGCGTG 1500
Tyr Lys Gln Gly Val Asp Leu Asn Ser Ile Glu Trp Ala His His
390 395 400 405
CTAGAGGCAT CAAATAAAAC GAAAGGCTCA GTCGAAAGAC TGGGCCTTTC GTTTTATCTG 1560
TTGTTTGTCG GTGAACGCTC TCCTGAGTAG GACAAATCCG CCGCCCTAGA CCTAGGGGAT 1620
ATATTCCGCT TCCTCGCTCA CTGACTCGCT ACGCTCGGTC GTTCGACTGC GGCGAGCGGA 1680
AATGGCTTAC GAACGGGGCG GAGATTTCCT GGAAGATGCC AGGAAGATAC TTAACAGGGA 1740
AGTGAGAGGG CCGCGGCAAA GCCGTTTTTC CATAGGCTCC GCCCCCCTGA CAAGCATCAC 1800
GAAATCTGAC GCTCAAATCA GTGGTGGCGA AACCCGACAG GACTATAAAG ATACCAGGCG 1860
TTTCCCCCTG GCGGCTCCCT CGTGCGCTCT CCTGTTCCTG CCTTTCGGTT TACCGGTGTC 1920
ATTCCGCTGT TATGGCCGCG TTTGTCTCAT TCCACGCCTG ACACTCAGTT CCGGGTAGGC 1980
AGTTCGCTCC AAGCTGGACT GTATGCACGA ACCCCCCGTT CAGTCCGACC GCTGCGCCTT 2040
ATCCGGTAAC TATCGTCTTG AGTCCAACCC GGAAAGACAT GCAAAAGCAC CACTGGCAGC 2100
AGCCACTGGT AATTGATTTA GAGGAGTTAG TCTTGAAGTC ATGCGCCGGT TAAGGCTAAA 2160
CTGAAAGGAC AAGTTTTGGT GACTGCGCTC CTCCAAGCCA GTTACCTCGG TTCAAAGAGT 2220
TGGTAGCTCA GAGAACCTTC GAAAAACCGC CCTGCAAGGC GGTTTTTTCG TTTTCAGAGC 2280
AAGAGATTAC GCGCAGACCA AAACGATCTC AAGAAGATCA TCTTATTAAT CAGATAAAAT 2340
ATTTCTAGAT TTCAGTGCAA TTTATCTCTT CAAATGTAGC ACCTGAAGTC AGCCCCATAC 2400
GATATAAGTT GTTACTAGTG CTTGGATTCT CACCAATAAA AAACGCCCGG CGGCAACCGA 2460
GCGTTCTGAA CAAATCCAGA TGGAGTTCTG AGGTCATTAC TGGATCTATC AACAGGAGTC 2520
CAAGCGAGCT CTCGAACCCC AGAGTCCCGC TCAGAAGAAC TCGTCAAGAA GGCGATAGAA 2580
GGCGATGCGC TGCGAATCGG GAGCGGCGAT ACCGTAAAGC ACGAGGAAGC GGTCAGCCCA 2640
TTCGCCGCCA AGCTCTTCAG CAATATCACG GGTAGCCAAC GCTATGTCCT GATAGCGGTC 2700
CGCCACACCC AGCCGGCCAC AGTCGATGAA TCCAGAAAAG CGGCCATTTT CCACCATGAT 2760
ATTCGGCAAG CAGGCATCGC CATGGGTCAC GACGAGATCC TCGCCGTCGG GCATGCGCGC 2820
CTTGAGCCTG GCGAACAGTT CGGCTGGCGC GAGCCCCTGA TGCTCTTCGT CCAGATCATC 2880
CTGATCGACA AGACCGGCTT CCATCCGAGT ACGTGCTCGC TCGATGCGAT GTTTCGCTTG 2940
GTGGTCGAAT GGGCAGGTAG CCGGATCAAG CGTATGCAGC CGCCGCATTG CATCAGCCAT 3000
GATGGATACT TTCTCGGCAG GAGCAAGGTG AGATGACAGG AGATCCTGCC CCGGCACTTC 3060
GCCCAATAGC AGCCAGTCCC TTCCCGCTTC AGTGACAACG TCGAGCACAG CTGCGCAAGG 3120
AACGCCCGTC GTGGCCAGCC ACGATAGCCG CGCTGCCTCG TCCTGCAGTT CATTCAGGGC 3180
ACCGGACAGG TCGGTCTTGA CAAAAAGAAC CGGGCGCCCC TGCGCTGACA GCCGGAACAC 3240
GGCGGCATCA GAGCAGCCGA TTGTCTGTTG TGCCCAGTCA TAGCCGAATA GCCTCTCCAC 3300
CCAAGCGGCC GGAGAACCTG CGTGCAATCC ATCTTGTTCA ATCATGCGAA ACGATCCTCA 3360
TCCTGTCTCT TGATCAGATC TTGATCCCCT GCGCCATCAG ATCCTTGGCG GCAAGAAAGC 3420
CATCCAGTTT ACTTTGCAGG GCTTCCCAAC CTTACCAGAG GGCGCCCCAG CTGGCAATTC 3480
C 3481
(2)SEQ ID NO:12的信息:
(i)序列特征:
(A)长度:404个氨基酸
(B)类型:氨基酸
(D)拓扑结构:线性
(ii)分子类型:蛋白质
(xi)序列描述:SEQ ID NO:12:
Met Lys Leu Pro Ile Tyr Leu Asp Tyr Ser Ala Thr Thr Pro Val Asp
1 5 10 15
Pro Arg Val Ala Glu Lys Met Met Gln Phe Met Thr Met Asp Gly Thr
20 25 30
Phe Gly Asn Pro Ala Ser Arg Ser His Arg Phe Gly Trp Gln Ala Glu
35 40 45
Glu Ala Val Asp Ile Ala Arg Asn Gln Ile Ala Asp Leu Val Gly Ala
50 55 60
Asp Pro Arg Glu Ile Val Phe Thr Ser Gly Ala Thr Glu Ser Asp Asn
65 70 75 80
Leu Ala Ile Lys Gly Ala Ala Asn Phe Tyr Gln Lys Lys Gly Lys His
85 90 95
Ile Ile Thr Ser Lys Thr Glu His Lys Ala Val Leu Asp Thr Cys Arg
100 105 110
Gln Leu Glu Arg Glu Gly Phe Glu Val Thr Tyr Leu Ala Pro Gln Arg
115 120 125
Asn Gly Ile Ile Asp Leu Lys Glu Leu Glu Ala Ala Met Arg Asp Asp
130 135 140
Thr Ile Leu Val Ser Ile Met His Val Asn Asn Glu Ile Gly Val Val
145 150 155 160
Gln Asp Ile Ala Ala Ile Gly Glu Met Cys Arg Ala Arg Gly Ile Ile
165 170 175
Tyr His Val Asp Ala Thr Gln Ser Val Gly Lys Leu Pro Ile Asp Leu
180 185 190
Ser Gln Leu Lys Val Asp Leu Met Ser Phe Ser Gly His Lys Ile Tyr
195 200 205
Gly Pro Lys Gly Ile Gly Ala Leu Tyr Val Arg Arg Lys Pro Arg Val
210 215 220
Arg Ile Glu Ala Gln Met His Gly Gly Gly His Glu Arg Gly Met Arg
225 230 235 240
Ser Gly Thr Leu Pro Val His Gln Ile Val Gly Met Gly Glu Ala Tyr
245 250 255
Arg Ile Ala Lys Glu Glu Met Ala Thr Glu Met Glu Arg Leu Arg Gly
260 265 270
Leu Arg Asn Arg Leu Trp Asn Gly Ile Lys Asp Ile Glu Glu Val Tyr
275 280 285
Leu Asn Gly Asp Leu Glu His Gly Ala Pro Asn Ile Leu Asn Val Ser
290 295 300
Phe Asn Tyr Val Glu Gly Glu Ser Leu Ile Met Ala Leu Lys Asp Leu
305 310 315 320
Ala Val Ser Ser Gly Ser Ala Cys Thr Ser Ala Ser Leu Glu Pro Ser
325 330 335
Tyr Val Leu Arg Ala Leu Gly Leu Asn Asp Glu Leu Ala His Ser Ser
340 345 350
Ile Arg Phe Ser Leu Gly Arg Phe Thr Thr Glu Glu Glu Ile Asp Tyr
355 360 365
Thr Ile Glu Leu Val Arg Lys Ser Ile Gly Arg Leu Arg Asp Leu Ser
370 375 380
Pro Leu Trp Glu Met Tyr Lys Gln Gly Val Asp Leu Asn Ser Ile Glu
385 390 395 400
Trp Ala His His