gpdat_1.seq Genetic Sequence Data Bank 08-18-2011 GenPept Release 185.0 Translated Protein-coding Sequences (Part 1) 3780922 loci containing 1184902624 residues =========================================================== gpdat_2.seq Genetic Sequence Data Bank 08-18-2011 GenPept Release 185.0 Translated Protein-coding Sequences (Part 2) 3821378 loci containing 1099284392 residues =========================================================== gpdat_3.seq Genetic Sequence Data Bank 08-18-2011 GenPept Release 185.0 Translated Protein-coding Sequences (Part 3) 3832265 loci containing 1208008826 residues =========================================================== gpdat_4.seq Genetic Sequence Data Bank 08-18-2011 GenPept Release 185.0 Translated Protein-coding Sequences (Part 4) 692945 loci containing 178629010 residues =========================================================== TOTAL 11758124 loci containing 3564154228 residues Table of Contents 1. INTRODUCTION 1.1 Release 185.0 1.2 Organization of This Document 1.3 Important Changes in Release 185.0 1.4 Recent Changes in the Data Bank 1.4.1 TSA Division Added (Release 173.0) 1.5 Upcoming Changes 2. ORGANIZATION OF FILES 2.1 File Descriptions 2.2 Entries by division 3. FILE FORMAT 3.1 File Header Information 3.2 Sequence Entry Files 3.2.1 Entry Organization 3.2.2 Sample Sequence Data File 4 TRADEMARKS, CITATIONS, ETC. 4.1 Registered Trademark Notices 4.2 Citing GenPept 4.3 GenPept Distribution Format 4.4 Disclaimer APPENDIX A - IUPAC-IUB AMINO ACID CODES List Of Examples and Tables Example 1. Sample File Header Example 2. Sample Sequence Data File This document describes the GenPept data bank available via anonymous FTP from the Advanced Biomedical Computing Center (ftp.ncifcrf.gov). GenPept is produced by parsing the corresponding GenBank release for translated coding regions of GenBank as defined in the FEATURES section of each sequence. If you have any questions or comments about the data bank or this document, please contact: Gary Smythers smytherg@mail.nih.gov 301-846-5778 #----------------------------- # Gary W. Smythers [Contractor] # Programmer Analyst IV # Advanced Biomedical Computing Center # SAIC NCI-Frederick # National Cancer Institute at Frederick # Post Office Box B # Frederick, MD 21702-1201 USA # Phone: 301-846-5778 # FAX: 301-846-5762 # smytherg@mail.nih.gov #----------------------------- 1. INTRODUCTION 1.1 Release 185.0 GenPept Release 185.0 includes the translations of all protein coding regions in GenBank Release 185.0. GenPept Release 185.0 includes 12,127,510 loci representing 3,670,824,852 residues. Supplemental files of daily updates, both cumulative and non-cumulative are also available. 1.2 Organization of This Document This introduction notes changes to the GenPept data bank since the last release. The next section describes the contents of the files. The third section illustrates the formats of the files. 1.3 Important Changes in Release 180.1 NONE 1.4 Recent Changes in the Data Bank 1.4.1 TSA Division Added (Release 173.0) Transcriptome Shotgun Assembly division added. TSA is an archive of computationally assembled sequences from primary data submitted to dbEST, the Short Read Archive (SRA), or the Trace Archive. The overlapping sequence reads from a complete transcriptome are assembled into transcripts by conputational methods instead of by traditional cloning and sequencing of cloned cDNAs. 1.5 Upcoming Changes NONE 2. ORGANIZATION OF FILES 2.1 File Descriptions The GenPept release includes the following files: /pub/genpept/gprel.txt.gz - Release Notes (this document). /pub/genpept/gpdat_1.seq.gz - GenPept entries Part 1. /pub/genpept/gpdat_2.seq.gz - GenPept entries Part 2. /pub/genpept/gpdat_3.seq.gz - GenPept entries Part 3. /pub/genpept/gpdat_4.seq.gz - GenPept entries Part 4. /pub/genpept/gpdat.fasta.gz - All GenPept entries (fasta format). Individual division files in /pub/genpept/divisions gpbct1.seq.gz ... gpbct75.seq.gz - Bacterial sequences. gpenv1.seq.gz ... gpenv42.seq.gz - Environmental gpest1.seq.gz ... gpest447.seq.g - Expressed sequence tags. gpgss1.seq.gz ... gpgss248.seq.gz - Genome Survey Sequence. gphtc1.seq.gz ... gphtc15.seq.gz - High Throughput cDNA. gphtg1.seq.gz ... gphtg135.seq.gz - High Throughput Genome. gpinv1.seq.gz ... gpinv29.seq.gz - Invertebrate sequences. gpmam1.seq.gz ... gpmam7.seq.gz - Other mammalian sequences. gppat1.seq.gz ... gppat168.seq.gz - Patent sequences. gpphg1.seq.gz - Phage sequences. gppln1.seq.gz ... gppln50.seq.gz - Plant sequences. gppri1.seq.gz ... gppri42.seq.gz - Primate sequences. gprod1.seq.gz ... gprod29.seq.gz - Rodent sequences. gpsts1.seq.gz ... gpsts20.seq.gz - STS sequences. gpsyn1.seq.gz ... gpsyn3.seq.gz - Synthetic and chimeric sequences. gptsa1.seq.gz ... gptsa35.seq.gz - Transcript Shotgun Assembly.. gpuna1.seq.gz - Unannotated sequences. gpvrl1.seq.gz ... gpvrl18.seq.gz - Viral sequences. gpvrt1.seq.gz ... gpvrt24.seq.gz - Other vertebrate sequences. /pub/genpept/updates/gpseq_updates.dat.gz - Daily cumulative updates. /pub/genpept/updates/gpncMMDD.seq.gz - Daily non-cumulative updates. 2.2 Entries by division: Filename Loci Residues gpbct1 56163 16307149 gpbct10 104754 33012068 gpbct11 105342 33480023 gpbct12 96558 30016758 gpbct13 1900 481913 gpbct14 68603 18648623 gpbct15 103161 31999382 gpbct16 83495 26561147 gpbct17 101713 31939520 gpbct18 98752 32703543 gpbct19 106239 33087760 gpbct2 103778 31794050 gpbct20 100119 32288705 gpbct21 100685 31593414 gpbct22 83081 27947490 gpbct23 96410 31002396 gpbct24 100712 30442272 gpbct25 101825 32213812 gpbct26 97783 31183496 gpbct27 99326 31004044 gpbct28 100347 31157176 gpbct29 99719 30780919 gpbct3 100959 31459328 gpbct30 99751 31423685 gpbct31 98372 30882552 gpbct32 101196 31703522 gpbct33 104259 31243013 gpbct34 97737 30248038 gpbct35 59601 18446206 gpbct36 103265 30594376 gpbct37 97671 31363746 gpbct38 94836 31094487 gpbct39 93758 30268952 gpbct4 102992 31100211 gpbct40 96313 30855188 gpbct41 96991 32228279 gpbct42 101519 31167371 gpbct43 101594 31711387 gpbct44 93195 29676172 gpbct45 99091 30787065 gpbct46 97263 30685214 gpbct47 36993 11901747 gpbct48 96144 30915786 gpbct49 90350 29557434 gpbct5 41077 11865115 gpbct50 98456 30554976 gpbct51 96014 30704384 gpbct52 98977 31907559 gpbct53 97501 30446128 gpbct54 94196 30388203 gpbct55 100036 31673342 gpbct56 107128 33147720 gpbct57 87607 27799750 gpbct58 76023 24044050 gpbct59 1483 459204 gpbct6 64420 18308173 gpbct60 3369 1019669 gpbct61 5173 1447126 gpbct62 9383 2787070 gpbct63 20856 5040982 gpbct64 39042 9708825 gpbct65 53972 13370240 gpbct66 53720 14521145 gpbct67 82869 25363415 gpbct68 84435 25685554 gpbct69 82916 26369741 gpbct7 85105 26812518 gpbct70 93795 28186933 gpbct71 95285 28743542 gpbct72 61568 15463907 gpbct73 45155 11767493 gpbct74 52069 14224735 gpbct75 27669 8125004 gpbct8 80414 24724612 gpbct9 86494 26536040 gpenv1 17161 3145761 gpenv10 7465 1359819 gpenv11 2732 593893 gpenv12 6690 1196638 gpenv13 3251 620509 gpenv14 5305 1034231 gpenv15 4915 863908 gpenv16 7689 1272828 gpenv17 19487 4627071 gpenv18 0 0 gpenv19 927 141663 gpenv2 19966 4518172 gpenv20 10314 1875624 gpenv21 0 0 gpenv22 0 0 gpenv23 0 0 gpenv24 2773 608391 gpenv25 14286 2897807 gpenv26 2099 509998 gpenv27 1423 233055 gpenv28 7338 1554614 gpenv29 1825 329509 gpenv3 11603 2376880 gpenv30 0 0 gpenv31 2760 582094 gpenv32 1330 227626 gpenv33 2516 381044 gpenv34 384 72821 gpenv35 1201 190233 gpenv36 10651 1746110 gpenv37 4323 861866 gpenv38 0 0 gpenv39 497 119416 gpenv4 20522 4540572 gpenv40 1284 225506 gpenv41 5792 1134166 gpenv42 636 127364 gpenv5 8975 1618190 gpenv6 4317 810729 gpenv7 1454 261978 gpenv8 11858 2562846 gpenv9 9931 1925828 gpest1 0 0 gpest10 0 0 gpest100 0 0 gpest101 0 0 gpest102 0 0 gpest103 0 0 gpest104 0 0 gpest105 0 0 gpest106 0 0 gpest107 0 0 gpest108 0 0 gpest109 0 0 gpest11 0 0 gpest110 0 0 gpest111 0 0 gpest112 0 0 gpest113 0 0 gpest114 0 0 gpest115 0 0 gpest116 0 0 gpest117 0 0 gpest118 0 0 gpest119 0 0 gpest12 0 0 gpest120 0 0 gpest121 0 0 gpest122 0 0 gpest123 0 0 gpest124 0 0 gpest125 0 0 gpest126 0 0 gpest127 0 0 gpest128 0 0 gpest129 0 0 gpest13 0 0 gpest130 0 0 gpest131 0 0 gpest132 0 0 gpest133 0 0 gpest134 0 0 gpest135 0 0 gpest136 0 0 gpest137 0 0 gpest138 0 0 gpest139 0 0 gpest14 0 0 gpest140 0 0 gpest141 0 0 gpest142 0 0 gpest143 0 0 gpest144 0 0 gpest145 0 0 gpest146 0 0 gpest147 0 0 gpest148 0 0 gpest149 0 0 gpest15 0 0 gpest150 0 0 gpest151 0 0 gpest152 0 0 gpest153 0 0 gpest154 0 0 gpest155 0 0 gpest156 0 0 gpest157 0 0 gpest158 0 0 gpest159 0 0 gpest16 0 0 gpest160 0 0 gpest161 0 0 gpest162 0 0 gpest163 0 0 gpest164 0 0 gpest165 0 0 gpest166 0 0 gpest167 0 0 gpest168 0 0 gpest169 0 0 gpest17 0 0 gpest170 0 0 gpest171 0 0 gpest172 0 0 gpest173 0 0 gpest174 0 0 gpest175 0 0 gpest176 0 0 gpest177 0 0 gpest178 0 0 gpest179 0 0 gpest18 0 0 gpest180 0 0 gpest181 0 0 gpest182 0 0 gpest183 0 0 gpest184 0 0 gpest185 0 0 gpest186 0 0 gpest187 0 0 gpest188 0 0 gpest189 0 0 gpest19 0 0 gpest190 0 0 gpest191 0 0 gpest192 0 0 gpest193 0 0 gpest194 0 0 gpest195 0 0 gpest196 0 0 gpest197 0 0 gpest198 0 0 gpest199 0 0 gpest2 0 0 gpest20 0 0 gpest200 0 0 gpest201 0 0 gpest202 0 0 gpest203 0 0 gpest204 0 0 gpest205 0 0 gpest206 0 0 gpest207 0 0 gpest208 0 0 gpest209 0 0 gpest21 0 0 gpest210 0 0 gpest211 0 0 gpest212 0 0 gpest213 0 0 gpest214 0 0 gpest215 0 0 gpest216 0 0 gpest217 0 0 gpest218 0 0 gpest219 0 0 gpest22 0 0 gpest220 0 0 gpest221 0 0 gpest222 0 0 gpest223 0 0 gpest224 0 0 gpest225 0 0 gpest226 0 0 gpest227 0 0 gpest228 0 0 gpest229 0 0 gpest23 0 0 gpest230 0 0 gpest231 0 0 gpest232 0 0 gpest233 0 0 gpest234 0 0 gpest235 0 0 gpest236 0 0 gpest237 0 0 gpest238 0 0 gpest239 0 0 gpest24 0 0 gpest240 0 0 gpest241 0 0 gpest242 0 0 gpest243 0 0 gpest244 0 0 gpest245 0 0 gpest246 0 0 gpest247 0 0 gpest248 0 0 gpest249 0 0 gpest25 0 0 gpest250 0 0 gpest251 0 0 gpest252 0 0 gpest253 0 0 gpest254 0 0 gpest255 0 0 gpest256 0 0 gpest257 0 0 gpest258 0 0 gpest259 0 0 gpest26 0 0 gpest260 0 0 gpest261 0 0 gpest262 0 0 gpest263 0 0 gpest264 0 0 gpest265 0 0 gpest266 0 0 gpest267 0 0 gpest268 0 0 gpest269 0 0 gpest27 0 0 gpest270 0 0 gpest271 0 0 gpest272 0 0 gpest273 0 0 gpest274 0 0 gpest275 0 0 gpest276 0 0 gpest277 0 0 gpest278 0 0 gpest279 0 0 gpest28 0 0 gpest280 0 0 gpest281 0 0 gpest282 0 0 gpest283 0 0 gpest284 0 0 gpest285 0 0 gpest286 0 0 gpest287 0 0 gpest288 0 0 gpest289 0 0 gpest29 0 0 gpest290 0 0 gpest291 0 0 gpest292 0 0 gpest293 0 0 gpest294 0 0 gpest295 0 0 gpest296 0 0 gpest297 0 0 gpest298 0 0 gpest299 0 0 gpest3 0 0 gpest30 0 0 gpest300 0 0 gpest301 0 0 gpest302 0 0 gpest303 0 0 gpest304 0 0 gpest305 0 0 gpest306 0 0 gpest307 0 0 gpest308 0 0 gpest309 0 0 gpest31 0 0 gpest310 0 0 gpest311 0 0 gpest312 0 0 gpest313 0 0 gpest314 0 0 gpest315 0 0 gpest316 0 0 gpest317 0 0 gpest318 0 0 gpest319 0 0 gpest32 0 0 gpest320 0 0 gpest321 0 0 gpest322 0 0 gpest323 0 0 gpest324 0 0 gpest325 0 0 gpest326 0 0 gpest327 0 0 gpest328 0 0 gpest329 0 0 gpest33 0 0 gpest330 0 0 gpest331 0 0 gpest332 0 0 gpest333 0 0 gpest334 0 0 gpest335 0 0 gpest336 0 0 gpest337 0 0 gpest338 0 0 gpest339 0 0 gpest34 0 0 gpest340 0 0 gpest341 0 0 gpest342 0 0 gpest343 0 0 gpest344 0 0 gpest345 0 0 gpest346 0 0 gpest347 0 0 gpest348 0 0 gpest349 0 0 gpest35 0 0 gpest350 0 0 gpest351 0 0 gpest352 0 0 gpest353 0 0 gpest354 0 0 gpest355 0 0 gpest356 0 0 gpest357 0 0 gpest358 0 0 gpest359 0 0 gpest36 0 0 gpest360 0 0 gpest361 0 0 gpest362 0 0 gpest363 0 0 gpest364 0 0 gpest365 0 0 gpest366 0 0 gpest367 0 0 gpest368 0 0 gpest369 0 0 gpest37 0 0 gpest370 0 0 gpest371 0 0 gpest372 0 0 gpest373 0 0 gpest374 0 0 gpest375 0 0 gpest376 0 0 gpest377 0 0 gpest378 0 0 gpest379 0 0 gpest38 0 0 gpest380 0 0 gpest381 0 0 gpest382 0 0 gpest383 0 0 gpest384 0 0 gpest385 0 0 gpest386 0 0 gpest387 0 0 gpest388 0 0 gpest389 0 0 gpest39 0 0 gpest390 0 0 gpest391 0 0 gpest392 0 0 gpest393 0 0 gpest394 0 0 gpest395 0 0 gpest396 0 0 gpest397 0 0 gpest398 0 0 gpest399 0 0 gpest4 0 0 gpest40 0 0 gpest400 0 0 gpest401 0 0 gpest402 0 0 gpest403 0 0 gpest404 0 0 gpest405 0 0 gpest406 0 0 gpest407 0 0 gpest408 0 0 gpest409 0 0 gpest41 0 0 gpest410 0 0 gpest411 0 0 gpest412 0 0 gpest413 0 0 gpest414 0 0 gpest415 0 0 gpest416 0 0 gpest417 0 0 gpest418 0 0 gpest419 0 0 gpest42 0 0 gpest420 0 0 gpest421 0 0 gpest422 0 0 gpest423 0 0 gpest424 0 0 gpest425 0 0 gpest426 0 0 gpest427 0 0 gpest428 0 0 gpest429 0 0 gpest43 0 0 gpest430 0 0 gpest431 0 0 gpest432 0 0 gpest433 0 0 gpest434 0 0 gpest435 0 0 gpest436 0 0 gpest437 0 0 gpest438 0 0 gpest439 0 0 gpest44 0 0 gpest440 0 0 gpest441 0 0 gpest442 0 0 gpest443 0 0 gpest444 0 0 gpest445 0 0 gpest446 0 0 gpest447 0 0 gpest45 0 0 gpest46 0 0 gpest47 0 0 gpest48 0 0 gpest49 0 0 gpest5 0 0 gpest50 0 0 gpest51 0 0 gpest52 0 0 gpest53 0 0 gpest54 0 0 gpest55 0 0 gpest56 0 0 gpest57 0 0 gpest58 0 0 gpest59 0 0 gpest6 0 0 gpest60 0 0 gpest61 0 0 gpest62 0 0 gpest63 0 0 gpest64 0 0 gpest65 0 0 gpest66 0 0 gpest67 0 0 gpest68 0 0 gpest69 0 0 gpest7 0 0 gpest70 0 0 gpest71 0 0 gpest72 0 0 gpest73 0 0 gpest74 0 0 gpest75 0 0 gpest76 0 0 gpest77 0 0 gpest78 0 0 gpest79 0 0 gpest8 0 0 gpest80 0 0 gpest81 0 0 gpest82 0 0 gpest83 0 0 gpest84 0 0 gpest85 0 0 gpest86 0 0 gpest87 0 0 gpest88 0 0 gpest89 0 0 gpest9 0 0 gpest90 0 0 gpest91 0 0 gpest92 0 0 gpest93 0 0 gpest94 0 0 gpest95 0 0 gpest96 0 0 gpest97 0 0 gpest98 0 0 gpest99 0 0 gpgss1 0 0 gpgss10 0 0 gpgss100 0 0 gpgss101 0 0 gpgss102 0 0 gpgss103 0 0 gpgss104 0 0 gpgss105 0 0 gpgss106 0 0 gpgss107 0 0 gpgss108 0 0 gpgss109 0 0 gpgss11 0 0 gpgss110 0 0 gpgss111 0 0 gpgss112 0 0 gpgss113 0 0 gpgss114 0 0 gpgss115 0 0 gpgss116 0 0 gpgss117 0 0 gpgss118 0 0 gpgss119 0 0 gpgss12 0 0 gpgss120 0 0 gpgss121 0 0 gpgss122 0 0 gpgss123 0 0 gpgss124 0 0 gpgss125 0 0 gpgss126 0 0 gpgss127 0 0 gpgss128 0 0 gpgss129 0 0 gpgss13 0 0 gpgss130 0 0 gpgss131 0 0 gpgss132 0 0 gpgss133 0 0 gpgss134 0 0 gpgss135 0 0 gpgss136 0 0 gpgss137 0 0 gpgss138 0 0 gpgss139 0 0 gpgss14 0 0 gpgss140 0 0 gpgss141 0 0 gpgss142 0 0 gpgss143 0 0 gpgss144 0 0 gpgss145 0 0 gpgss146 0 0 gpgss147 0 0 gpgss148 0 0 gpgss149 0 0 gpgss15 0 0 gpgss150 0 0 gpgss151 0 0 gpgss152 0 0 gpgss153 0 0 gpgss154 0 0 gpgss155 0 0 gpgss156 0 0 gpgss157 0 0 gpgss158 0 0 gpgss159 0 0 gpgss16 0 0 gpgss160 0 0 gpgss161 0 0 gpgss162 0 0 gpgss163 0 0 gpgss164 0 0 gpgss165 0 0 gpgss166 0 0 gpgss167 0 0 gpgss168 0 0 gpgss169 61 12916 gpgss17 0 0 gpgss170 0 0 gpgss171 0 0 gpgss172 0 0 gpgss173 0 0 gpgss174 0 0 gpgss175 0 0 gpgss176 0 0 gpgss177 0 0 gpgss178 0 0 gpgss179 0 0 gpgss18 0 0 gpgss180 0 0 gpgss181 0 0 gpgss182 0 0 gpgss183 0 0 gpgss184 0 0 gpgss185 0 0 gpgss186 0 0 gpgss187 0 0 gpgss188 0 0 gpgss189 0 0 gpgss19 0 0 gpgss190 0 0 gpgss191 0 0 gpgss192 0 0 gpgss193 0 0 gpgss194 0 0 gpgss195 0 0 gpgss196 0 0 gpgss197 0 0 gpgss198 0 0 gpgss199 0 0 gpgss2 0 0 gpgss20 0 0 gpgss200 0 0 gpgss201 0 0 gpgss202 0 0 gpgss203 0 0 gpgss204 0 0 gpgss205 0 0 gpgss206 0 0 gpgss207 0 0 gpgss208 0 0 gpgss209 0 0 gpgss21 0 0 gpgss210 0 0 gpgss211 0 0 gpgss212 0 0 gpgss213 0 0 gpgss214 0 0 gpgss215 0 0 gpgss216 0 0 gpgss217 0 0 gpgss218 0 0 gpgss219 0 0 gpgss22 0 0 gpgss220 0 0 gpgss221 0 0 gpgss222 0 0 gpgss223 0 0 gpgss224 0 0 gpgss225 0 0 gpgss226 0 0 gpgss227 0 0 gpgss228 0 0 gpgss229 0 0 gpgss23 0 0 gpgss230 0 0 gpgss231 0 0 gpgss232 0 0 gpgss233 0 0 gpgss234 0 0 gpgss235 0 0 gpgss236 0 0 gpgss237 0 0 gpgss238 0 0 gpgss239 0 0 gpgss24 0 0 gpgss240 0 0 gpgss241 0 0 gpgss242 0 0 gpgss243 0 0 gpgss244 0 0 gpgss245 0 0 gpgss246 0 0 gpgss247 0 0 gpgss248 0 0 gpgss25 0 0 gpgss26 0 0 gpgss27 0 0 gpgss28 0 0 gpgss29 0 0 gpgss3 0 0 gpgss30 0 0 gpgss31 0 0 gpgss32 0 0 gpgss33 0 0 gpgss34 0 0 gpgss35 0 0 gpgss36 0 0 gpgss37 0 0 gpgss38 0 0 gpgss39 0 0 gpgss4 0 0 gpgss40 0 0 gpgss41 0 0 gpgss42 0 0 gpgss43 0 0 gpgss44 0 0 gpgss45 0 0 gpgss46 0 0 gpgss47 0 0 gpgss48 0 0 gpgss49 0 0 gpgss5 0 0 gpgss50 0 0 gpgss51 0 0 gpgss52 0 0 gpgss53 0 0 gpgss54 0 0 gpgss55 0 0 gpgss56 0 0 gpgss57 0 0 gpgss58 0 0 gpgss59 0 0 gpgss6 0 0 gpgss60 0 0 gpgss61 0 0 gpgss62 0 0 gpgss63 0 0 gpgss64 0 0 gpgss65 0 0 gpgss66 0 0 gpgss67 0 0 gpgss68 0 0 gpgss69 0 0 gpgss7 0 0 gpgss70 0 0 gpgss71 0 0 gpgss72 0 0 gpgss73 0 0 gpgss74 0 0 gpgss75 0 0 gpgss76 0 0 gpgss77 0 0 gpgss78 0 0 gpgss79 0 0 gpgss8 0 0 gpgss80 0 0 gpgss81 0 0 gpgss82 0 0 gpgss83 0 0 gpgss84 0 0 gpgss85 0 0 gpgss86 0 0 gpgss87 0 0 gpgss88 0 0 gpgss89 0 0 gpgss9 0 0 gpgss90 0 0 gpgss91 0 0 gpgss92 0 0 gpgss93 0 0 gpgss94 0 0 gpgss95 0 0 gpgss96 0 0 gpgss97 0 0 gpgss98 0 0 gpgss99 0 0 gphtc1 9654 2386776 gphtc10 13583 2749117 gphtc11 713 352883 gphtc12 0 0 gphtc13 105 56038 gphtc14 9599 3096078 gphtc15 14378 2668609 gphtc2 6112 2412864 gphtc3 5947 2345170 gphtc4 5750 2203727 gphtc5 7594 3385832 gphtc6 9233 3766862 gphtc7 3999 1626363 gphtc8 37 13171 gphtc9 5491 2233568 gphtg1 1 712 gphtg10 0 0 gphtg100 0 0 gphtg101 0 0 gphtg102 0 0 gphtg103 0 0 gphtg104 0 0 gphtg105 0 0 gphtg106 0 0 gphtg107 0 0 gphtg108 0 0 gphtg109 0 0 gphtg11 0 0 gphtg110 0 0 gphtg111 0 0 gphtg112 94 52971 gphtg113 5220 1344361 gphtg114 0 0 gphtg115 11939 3258405 gphtg116 1651 426915 gphtg117 12464 5828662 gphtg118 16 7139 gphtg119 0 0 gphtg12 0 0 gphtg120 0 0 gphtg121 4268 1240469 gphtg122 0 0 gphtg123 0 0 gphtg124 0 0 gphtg125 0 0 gphtg126 0 0 gphtg127 0 0 gphtg128 0 0 gphtg129 0 0 gphtg13 0 0 gphtg130 282 88107 gphtg131 162 71573 gphtg132 0 0 gphtg133 0 0 gphtg134 70 33099 gphtg135 4262 1299881 gphtg136 0 0 gphtg14 0 0 gphtg15 0 0 gphtg16 7 3848 gphtg17 0 0 gphtg18 0 0 gphtg19 0 0 gphtg2 0 0 gphtg20 26 14030 gphtg21 0 0 gphtg22 0 0 gphtg23 0 0 gphtg24 0 0 gphtg25 0 0 gphtg26 0 0 gphtg27 23 8608 gphtg28 1 234 gphtg29 19 15452 gphtg3 0 0 gphtg30 0 0 gphtg31 0 0 gphtg32 0 0 gphtg33 23 11441 gphtg34 0 0 gphtg35 0 0 gphtg36 0 0 gphtg37 0 0 gphtg38 0 0 gphtg39 0 0 gphtg4 0 0 gphtg40 0 0 gphtg41 0 0 gphtg42 0 0 gphtg43 0 0 gphtg44 0 0 gphtg45 0 0 gphtg46 0 0 gphtg47 33 14631 gphtg48 1 548 gphtg49 0 0 gphtg5 0 0 gphtg50 0 0 gphtg51 0 0 gphtg52 0 0 gphtg53 0 0 gphtg54 0 0 gphtg55 1 243 gphtg56 0 0 gphtg57 0 0 gphtg58 0 0 gphtg59 0 0 gphtg6 0 0 gphtg60 0 0 gphtg61 0 0 gphtg62 0 0 gphtg63 0 0 gphtg64 0 0 gphtg65 0 0 gphtg66 0 0 gphtg67 0 0 gphtg68 0 0 gphtg69 0 0 gphtg7 0 0 gphtg70 0 0 gphtg71 0 0 gphtg72 0 0 gphtg73 316 99716 gphtg74 0 0 gphtg75 59 22011 gphtg76 0 0 gphtg77 0 0 gphtg78 71 22015 gphtg79 0 0 gphtg8 0 0 gphtg80 0 0 gphtg81 0 0 gphtg82 0 0 gphtg83 0 0 gphtg84 0 0 gphtg85 0 0 gphtg86 0 0 gphtg87 0 0 gphtg88 0 0 gphtg89 0 0 gphtg9 0 0 gphtg90 0 0 gphtg91 0 0 gphtg92 0 0 gphtg93 0 0 gphtg94 0 0 gphtg95 0 0 gphtg96 0 0 gphtg97 0 0 gphtg98 0 0 gphtg99 0 0 gpinv1 29723 8795288 gpinv10 50949 11340126 gpinv11 31095 7334502 gpinv12 51145 11162948 gpinv13 52394 12021524 gpinv14 56221 13328068 gpinv15 30319 6976925 gpinv16 8664 2933930 gpinv17 4786 2510211 gpinv18 21007 7327970 gpinv19 37413 20795345 gpinv2 11530 4240639 gpinv20 42804 17798632 gpinv21 65962 13974703 gpinv22 61624 13498560 gpinv23 39190 8507325 gpinv24 58831 16232007 gpinv25 67742 14630849 gpinv26 64693 13830299 gpinv27 63760 14315830 gpinv28 66651 14434328 gpinv29 67717 14593710 gpinv3 9671 5419476 gpinv30 56643 14657804 gpinv31 19706 8446534 gpinv4 29773 15669371 gpinv5 46276 12338889 gpinv6 35584 10717604 gpinv7 35553 12426584 gpinv8 48777 11530534 gpinv9 44602 13324456 gpmam1 7780 2062080 gpmam2 11448 2677892 gpmam3 37327 10200926 gpmam4 5855 1408237 gpmam5 37805 9404629 gpmam6 21383 4683678 gpmam7 26487 6361529 gppat1 5126 1505189 gppat10 3915 1361347 gppat100 0 0 gppat101 0 0 gppat102 0 0 gppat103 0 0 gppat104 0 0 gppat105 0 0 gppat106 0 0 gppat107 2317 629273 gppat108 258 106444 gppat109 0 0 gppat11 3063 1205921 gppat110 1272 526233 gppat111 418 153327 gppat112 6421 2484390 gppat113 15834 6188151 gppat114 0 0 gppat115 0 0 gppat116 0 0 gppat117 0 0 gppat118 0 0 gppat119 0 0 gppat12 11551 2100922 gppat120 0 0 gppat121 0 0 gppat122 0 0 gppat123 0 0 gppat124 0 0 gppat125 0 0 gppat126 0 0 gppat127 0 0 gppat128 0 0 gppat129 0 0 gppat13 0 0 gppat130 0 0 gppat131 0 0 gppat132 0 0 gppat133 0 0 gppat134 0 0 gppat135 0 0 gppat136 0 0 gppat137 68 29006 gppat138 0 0 gppat139 0 0 gppat14 0 0 gppat140 2603 896222 gppat141 698 339174 gppat142 0 0 gppat143 57528 26706517 gppat144 61955 28459918 gppat145 19793 7620527 gppat146 907 421386 gppat147 11219 4270766 gppat148 46890 20122410 gppat149 19150 7368822 gppat15 0 0 gppat150 0 0 gppat151 0 0 gppat152 0 0 gppat153 0 0 gppat154 0 0 gppat155 3583 1335941 gppat156 3218 1058470 gppat157 0 0 gppat158 183 75356 gppat159 133 16947 gppat16 0 0 gppat160 1156 469337 gppat161 1050 448888 gppat162 0 0 gppat163 0 0 gppat164 0 0 gppat165 0 0 gppat166 107 100660 gppat167 986 437214 gppat168 983 369619 gppat17 0 0 gppat18 0 0 gppat19 0 0 gppat2 0 0 gppat20 0 0 gppat21 2751 1066337 gppat22 2071 732601 gppat23 3974 1283047 gppat24 5909 1976221 gppat25 3781 1503628 gppat26 192 63977 gppat27 0 0 gppat28 0 0 gppat29 0 0 gppat3 0 0 gppat30 0 0 gppat31 0 0 gppat32 0 0 gppat33 0 0 gppat34 0 0 gppat35 0 0 gppat36 0 0 gppat37 0 0 gppat38 0 0 gppat39 0 0 gppat4 0 0 gppat40 0 0 gppat41 0 0 gppat42 0 0 gppat43 0 0 gppat44 0 0 gppat45 0 0 gppat46 0 0 gppat47 0 0 gppat48 0 0 gppat49 0 0 gppat5 0 0 gppat50 0 0 gppat51 0 0 gppat52 0 0 gppat53 0 0 gppat54 0 0 gppat55 0 0 gppat56 2655 873899 gppat57 1049 517498 gppat58 51261 23192917 gppat59 17402 8882251 gppat6 0 0 gppat60 0 0 gppat61 0 0 gppat62 0 0 gppat63 0 0 gppat64 0 0 gppat65 0 0 gppat66 0 0 gppat67 0 0 gppat68 0 0 gppat69 0 0 gppat7 7025 2311444 gppat70 0 0 gppat71 0 0 gppat72 0 0 gppat73 0 0 gppat74 0 0 gppat75 0 0 gppat76 0 0 gppat77 0 0 gppat78 0 0 gppat79 0 0 gppat8 3107 1159563 gppat80 0 0 gppat81 0 0 gppat82 0 0 gppat83 0 0 gppat84 0 0 gppat85 0 0 gppat86 0 0 gppat87 0 0 gppat88 0 0 gppat89 0 0 gppat9 3907 1645335 gppat90 0 0 gppat91 0 0 gppat92 0 0 gppat93 0 0 gppat94 0 0 gppat95 0 0 gppat96 0 0 gppat97 0 0 gppat98 0 0 gppat99 0 0 gpphg1 90857 18717529 gppln1 34711 11841572 gppln10 15712 4905879 gppln11 24696 8495276 gppln12 32758 13918217 gppln13 23815 9504917 gppln14 7246 3505419 gppln15 6091 2897504 gppln16 6114 2925188 gppln17 17141 6231978 gppln18 30192 10067914 gppln19 20679 5546716 gppln2 31904 11356409 gppln20 20033 7778788 gppln21 4572 1637895 gppln22 36406 10606201 gppln23 11526 3518220 gppln24 31277 8197131 gppln25 45188 13741222 gppln26 46059 16370712 gppln27 26210 11246265 gppln28 29432 12880296 gppln29 15841 7746971 gppln3 11303 5030394 gppln30 35629 15260080 gppln31 33270 7865824 gppln32 23258 6037921 gppln33 16389 3726622 gppln34 34154 7787538 gppln35 33351 8080098 gppln36 34868 7671918 gppln37 38716 8517389 gppln38 4331 918328 gppln39 16957 3339747 gppln4 5571 1830598 gppln40 38146 7541975 gppln41 32046 11324628 gppln42 32693 14798059 gppln43 27145 7482180 gppln44 4275 859362 gppln45 26697 5607224 gppln46 41392 10049331 gppln47 46863 9597901 gppln48 33442 7118829 gppln49 35362 11065309 gppln5 0 0 gppln50 17557 6957850 gppln6 0 0 gppln7 33772 13322122 gppln8 18291 4942158 gppln9 28061 8829231 gppri1 17560 6200608 gppri10 42 13381 gppri11 118 36898 gppri12 161 47964 gppri13 129 49780 gppri14 88 27447 gppri15 33 9722 gppri16 9 1906 gppri17 0 0 gppri18 0 0 gppri19 0 0 gppri2 9799 2838519 gppri20 0 0 gppri21 0 0 gppri22 11554 3562740 gppri23 19853 5790995 gppri24 20245 5624450 gppri25 19436 8479493 gppri26 7708 3740377 gppri27 3171 1721163 gppri28 2472 1415775 gppri29 2679 1567434 gppri3 547 206971 gppri30 2788 1592966 gppri31 11979 2564760 gppri32 52 27372 gppri33 39221 10823053 gppri34 12113 2499527 gppri35 23339 9120225 gppri36 15047 7561981 gppri37 21280 5930353 gppri38 38701 8952067 gppri39 51518 13502281 gppri4 193 80046 gppri40 25602 6524804 gppri41 47331 12886868 gppri42 18025 5433996 gppri5 248 102405 gppri6 212 77202 gppri7 25 11228 gppri8 112 43482 gppri9 280 90874 gprod1 11448 3444203 gprod10 0 0 gprod11 0 0 gprod12 0 0 gprod13 0 0 gprod14 0 0 gprod15 0 0 gprod16 0 0 gprod17 0 0 gprod18 0 0 gprod19 11984 4134909 gprod2 0 0 gprod20 16590 6866428 gprod21 3625 1895300 gprod22 3002 1534452 gprod23 11380 4145647 gprod24 30836 12034768 gprod25 19318 8668100 gprod26 1833 883537 gprod27 7740 2190191 gprod28 28323 6577276 gprod29 24447 7063165 gprod3 4 2148 gprod4 1 38 gprod5 1 120 gprod6 0 0 gprod7 0 0 gprod8 0 0 gprod9 0 0 gpsts1 1 66 gpsts10 0 0 gpsts11 0 0 gpsts12 0 0 gpsts13 0 0 gpsts14 0 0 gpsts15 0 0 gpsts16 0 0 gpsts17 4 432 gpsts18 0 0 gpsts19 0 0 gpsts2 0 0 gpsts20 4 314 gpsts3 0 0 gpsts4 0 0 gpsts5 0 0 gpsts6 0 0 gpsts7 0 0 gpsts8 0 0 gpsts9 0 0 gpsyn1 39026 15872606 gpsyn2 29440 11169480 gpsyn3 8930 2589669 gptsa1 1038 194157 gptsa10 26 6242 gptsa11 1425 398554 gptsa12 0 0 gptsa13 0 0 gptsa14 0 0 gptsa15 294 55067 gptsa16 0 0 gptsa17 0 0 gptsa18 0 0 gptsa19 0 0 gptsa2 166 28119 gptsa20 87 14526 gptsa21 0 0 gptsa22 0 0 gptsa23 10094 5047472 gptsa24 121 8223 gptsa25 9 2903 gptsa26 515 140239 gptsa27 0 0 gptsa28 0 0 gptsa29 4 622 gptsa3 0 0 gptsa30 0 0 gptsa31 0 0 gptsa32 0 0 gptsa33 0 0 gptsa34 25 3514 gptsa35 253 25707 gptsa4 3284 894798 gptsa5 0 0 gptsa6 12 2307 gptsa7 0 0 gptsa8 0 0 gptsa9 60 14088 gpuna1 52 7632 gpvrl1 81196 21006293 gpvrl10 73452 22875011 gpvrl11 17129 5678824 gpvrl12 71541 22336987 gpvrl13 70463 23463992 gpvrl14 73994 20700787 gpvrl15 72012 23570625 gpvrl16 70921 22986228 gpvrl17 72819 22521387 gpvrl18 64339 17407406 gpvrl2 84408 19701341 gpvrl3 77180 18832245 gpvrl4 78947 20402188 gpvrl5 50861 14606277 gpvrl6 63639 26313722 gpvrl7 66362 23284485 gpvrl8 82453 22189138 gpvrl9 76197 20508848 gpvrt1 24391 7144174 gpvrt10 1826 927136 gpvrt11 1982 947890 gpvrt12 909 423527 gpvrt13 2463 945725 gpvrt14 2551 1086818 gpvrt15 1383 602728 gpvrt16 20764 4829313 gpvrt17 49404 11424724 gpvrt18 52568 11912476 gpvrt19 20575 4387216 gpvrt2 3453 772827 gpvrt20 44377 10554664 gpvrt21 24072 5602161 gpvrt22 53500 11886428 gpvrt23 55047 12311926 gpvrt24 23757 5651775 gpvrt3 41255 10345755 gpvrt4 16467 4744200 gpvrt5 17787 3995479 gpvrt6 46034 12179296 gpvrt7 29121 11786332 gpvrt8 27317 8489514 gpvrt9 1697 796601 3. FILE FORMAT 3.1 File Header Information All of the files in the distribution begin with the same header, except for the first line, which contains the division name, the fifth line, which contains a description of the file contents, and the seventh line which contains the number of loci and residues in the file. The first line of the file contains the division name in character positions 1 to 9 and the full data bank name (Genetic Sequence Data Bank) starting in column 16. The brief names of the files in this release are listed in section 2.1. The second line contains the date of the current release in the form month-day-year, "MM-DD-YYYY". The fourth line contains the current GenPept release number. The release number consists of two numbers separated by a decimal point. The number to the left of the decimal is the major release number. The digit to the right of the decimal indicates the version of the major release; it is zero for the first version. The fifth line contains a title for the file. The seventh line lists the number of entries (loci) and the number of residues in this release of GenPept. 1-------10--------20--------30--------40--------50--------60--------70------78 gpbct1 Genetic Sequence Data Bank 08-19-2011 GenPept Release 185.0 Translated Protein-coding Sequences 56163 loci containing 16307149 residues 1-------10--------20--------30--------40--------50--------60--------70------78 Example 1. Sample File Header 3.2 Sequence Entry Files GenPept entries are derived from entries in the GenBank nucleotide sequence data bank. They contain minimal annotation, primarily extracted from the corresponding GenBank entries. For the complete annotations, refer to the GenBank entry or entries referenced by the accession number(s) in the GenPept entry. 3.2.1 Entry Organization Each record (one record = one line) consists of two parts. The first part is found in positions 1 to 10 and may contain: 1. A keyword, beginning in column 1 of the record (e.g., DEFINITION is a keyword). 2. Blank characters, indicating that this record is a continuation of the information under the keyword above it. 3. A number, ending in column 9 of the record. This number occurs in the portion of the entry containing the actual amino acid sequence and designates the numbering of sequence positions. 4. Two slashes (//) in positions 1 and 2, marking the end of an entry. The second part of each sequence entry record (line) contains the information appropriate to its keyword, in positions 13 to 80 for keywords and positions 11 to 80 for the sequence. The following is a brief description of each entry field. LOCUS The entry name. It consists the accession number of the GenBank nucleotide sequence entry (or entries) from which this product was translated, followed by an underscore character ( _ ) and a number indicating which coding region (CDS) in the feature table of the original GenBank entry was used for this translation. The number is determined by assigning a number to each CDS according to its order of appearance in the original GenBank entry's feature table. Detailed format for the LOCUS line: Positions Contents --------- -------- 01-05 'LOCUS' 06-12 spaces 13-25 GenPept Locus name 26-26 space 27-42 GenBank Locus name 43-43 space 44-49 Length of peptide sequence 50-50 space 51-52 'aa' 53-53 space 54-56 'PEP' 57-57 space 58-63 'linear' 64-64 space 65-67 GenBank division code 68-68 space 69-79 Date, in format dd-mmm-yyyy DEFINITION This field contains the feature as it appeared in the original GenBank entry that was translated to produce the sequence in this GenPept entry. If the GenBank CDS had a "/note" qualifier, the text of this qualifier is placed on a continuation line as part of the GenPept DEFINITION record. DATE Entry date for the GenBank locus used to create this record. ACCESSION Primary accession numbers of all the GenBank entries from which this GenPept entry was created. VERSION A compound identifier consisting of the GenPept Locus and a numeric version number associated with the current version of the sequence data in the record. This is followed by an integer key (a "GI") assigned to the peptide sequence. Mandatory keyword/exactly one record. KEYWORDS Short phrases describing gene products and other information, taken directly from the corresponding GenBank entry. Mandatory keyword in all annotated entries/one or more records. SOURCE Common name of the organism or the name most frequently used in the literature. Mandatory keyword in all annotated entries/one or more records/includes one subkeyword. ORGANISM Source organism of the nucleic acid sequence COMMENT This field identifies the coding regions translated to make this protein. It reproduces the relevant lines from the Feature tables of the GenBank data bank entries. WEIGHT Protein molecular weight calculated from the sequence. PI Isoelectric point. Mandatory keyword/exactly one record. LENGTH Protein length in amino acid residues. ORIGIN Indication of codon phase used in translation The sequence immediately follows the ORIGIN line. It uses the IUPAC-IUB one-letter amino acid codes (see Appendix A). The first 9 columns in each line are reserved for a right-justified integer representing the residue number of the first amino acid on the line. Column 10 is blank and the sequence begins in column 11. The sequence is presented with up to 60 residues per line, in groups of 10 residues separated by spaces. Note that "?"s in GenBank entries' /translation qualifier sequences are converted to "X"s in GenPept. Residues are in uppercase. // A double slash marks the end of each entry. The next entry begins on the following line. 3.2.2 Sample Sequence Data File An example of a complete sequence entry follows. 1-------10--------20--------30--------40--------50--------60--------70------78 LOCUS AB000100_1 AB000100 263 aa PEP linear BCT 15-MAY-2009 DEFINITION Synechococcus elongatus PCC 7942 genes for intrinsic membrane bio1$ -c /bioinfoB/dbase/genpeptdivs/gpbct1.seq.gz|head -40 gpbct1 Genetic Sequence Data Bank 08-19-2011 GenPept Release 185.0 Translated Protein-coding Sequences 56163 loci containing 16307149 residues LOCUS AB000100_1 AB000100 263 aa PEP linear BCT 15-MAY-2009 DEFINITION Synechococcus elongatus PCC 7942 genes for intrinsic membrane protein, malK-like protein, cyanase, complete cds. DATE 15-MAY-2009 ACCESSION AB000100 VERSION AB000100_1.1 GI:2330515 KEYWORDS . SOURCE Synechococcus elongatus PCC 7942 ORGANISM Synechococcus elongatus PCC 7942 Bacteria; Cyanobacteria; Chroococcales; Synechococcus. COMMENT CDS 121..912 /gene="cynB" /transl_table=11 /product="intrinsic membrane protein" /protein_id="BAA21794.1" /db_xref="GI:2330515" /NucGI="2330514" WEIGHT 28647.67 PI 9.76 LENGTH 263 ORIGIN Translated using phase 1 1 MVRTPVPLYL RWAVSILSVL AFLAIWQIAA ASGFLGKTFP GSLRTLQDLF GWLSDPFFDN 61 GPNDLGIGWN LLISLRRVAI GYLLATVVAI PLGIAIGMSA LASSIFSPFV QLLKPVSPLA 121 WLPIGLFLFR DSELTGVFVI LISSLWPTLI NTAFGVANVN PDFLKVSQSL GASRWRTILK 181 VILPAALPSI IAGMRISMGI AWLVIVAAEM LLGTGIGYFI WNEWNNLSLP NIFSAIIIIG 241 IVGILLDQGF RFLENQFSYA GNR // 1-------10--------20--------30--------40--------50--------60--------70------78 Example 2. Sample Sequence Data File 4 Trademarks, citations, etc. 4.1 Registered Trademark Notices GenBank (R) is a registered trademark of the U.S. Department of Health and Human Services for the Genetic Sequence Data Bank. GenPept (R) is a registered trademark of the U.S. Department of Health and Human Services for the GenBank Gene Products Data Bank. 4.2 Citing GenPept If you have used GenPept in your research, please include a reference to the database in all publications related to that research. For instance: 1. GenPept (GenBank Gene Products) Database. Distributed on the Internet via anonymous FTP from ftp.ncifcrf.gov, under the auspices of the National Cancer Institute's Advanced Biomedical Computing Center. When citing data in GenPept, it is appropriate to give the sequence name, release number, and the publication in which the parent GenBank sequence first appeared. It is also appropriate to list a reference for GenBank itself, since GenPept is derived from the GenBank data. The following publication, which describes the GenBank data bank, should be cited: Burks, C., Cassidy, M., Cinkosky, M.J., Cumella, K.E., Gilna, P., Hayden, J.E-D., Keen, G.M., Kelley, T.A., Kelly, M., Kristofferson, D., and Ryals, J. GenBank. Nucl. Acids Res. 19 (Suppl):2221-2225(1991) 4.3 GenPept Distribution Format The GenPept data bank is available by anonymous FTP from ftp.ncifcrf.gov. 4.4 Disclaimer Science Applications International Corp. and the United States Government make no representations or warranties regarding the content or accuracy of this information. Science Applications International Corp, and the United States Government also make no representations or warranties of merchantability or fitness for a particular purpose and accept no responsibility for any consequences of the receipt or use of the information. APPENDIX A - IUPAC-IUB AMINO ACID CODES Code Amino Acid A Alanine (ala) R Arginine (arg) N Asparagine (asn) D Aspartic acid (asp) C Cysteine (cys) Q Glutamine (gln) E Glutamic acid (glu) G Glycine (gly) H Histidine (his) I Isoleucine (ile) L Leucine (leu) K Lysine (lys) M Methionine (met) F Phenylalanine (phe) P Proline (pro) S Serine (ser) T Threonine (thr) U Selenocysteine W Tryptophan (trp) Y Tyrosine (tyr) V Valine (val) B Aspartic acid or Asparagine (asx) Z Glutamic acid or Glutamine (glx) X Any amino acid (xxx)