History of bioinformatics
Bioinformatics has emerged as a scientific discipline that encompasses the application of computing science and technology to analyze and manage biological data.
It began when it was demonstrated by Ingram that there is homology between sickle cell hemoglobin and normal hemoglobin. This led to comparison of other biological function, thus to manage the data effectively computer database was invented example are:
- GENBANK, USA
- EMBL, UK
- DDBJ, Japan.
A Chronological History of events:
- 1951
Pauling and Corey propose the structure for the alpha-helix and
beta-sheet (Proc. Natl. Acad. Sci. USA, 27: 205-211, 1951; Proc. Natl.
Acad. Sci. USA, 37: 729-740, 1951).
- 1953 - Watson & Crick
propose the double helix model for DNA based x-ray data obtained by
Franklin & Wilkins (Nature, 171: 737-738, 1953).
- 1954 - Perutz's group develop heavy atom methods to solve the phase problem in protein crystallography.
- 1955 - The sequence of the first protein to be analysed, bovine insulin, is announed by F.Sanger.
- 1958 - The first integrated circuit is constructed by Jack Kilby at Texas Instruments.
The Advanced Research Projects Agency (ARPA) is formed in the US
- 1962 - Pauling's theory of molecular evolution
- 1965 - Margaret Dayhoff's Atlas of Protein Sequences
- 1968 - Packet-switching network protocols are presented to ARPA
- 1969 - The ARPANET is created by linking computers at Stanford, UCSB, The University of Utah and UCLA.
- 1970 - The details of the Needleman-Wunsch algorithm for sequence comparison are published.
- 1971- Ray Tomlinson (BBN) invents the email program.
- 1972 - The first recombinant DNA molecule is created by Paul Berg and his group.
- 1973
- The Brookhaven Protein DataBank is announeced
(Acta.Cryst.B,1973,29:1764). Robert Metcalfe receives his Ph.D from
Harvard University. His thesis describes Ethernet.
- 1974 - Vint Cerf and Robert
Khan develop the concept of connecting networks of computers into an
"internet" and develop the Transmission Control Protocol (TCP).
- 1975 - Microsoft Corporation is founded by Bill Gates and Paul Allen.
Two-dimensional electrophoresis, where separation of proteins on SDS
polyacrylamide gel is combined with separation according to isoelectric
points, is announced by P. H. O'Farrell (J. Biol. Chem., 250:
4007-4021, 1975). - 1976 - The Unix-To-Unix Copy Protocol (UUCP) is developed at Bell Labs.
E. M. Southern published the experimental details for the Southern Blot
technique of specific sequences of DNA (J. Mol. Biol., 98: 503-517,
1975). - 1977 - The full description
of the Brookhaven PDB (http://www.pdb.bnl.gov) is published (Bernstein,
F.C.; Koetzle, T.F.; Williams, G.J.B.; Meyer, E.F.; Brice, M.D.;
Rodgers, J.R.; Kennard, O.; Shimanouchi, T.; Tasumi, M.J.; J. Mol.
Biol., 1977, 112:, 535).
Allan Maxam and Walter Gilbert (Harvard)
and Frederick Sanger (U.K. Medical Research Council), report methods
for sequencing DNA.
DNA sequencing and software to analyze it ( Staden )
- 1978 -
The first Usenet connection is established between Duke and the
University of North Carolina at Chapel Hill by Tom Truscott, Jim Ellis
and Steve Bellovin.
- 1980 - The first complete
gene sequence for an organism (FX174) is published. The gene consists
of 5,386 base pairs which code nine proteins.
Wüthrich et. al.
publish paper detailing the use of multi-dimensional NMR for protein
structure determination (Kumar, A.; Ernst, R.R.; Wüthrich, K.; Biochem.
Biophys. Res. Comm., 1980, 95:, 1).
IntelliGenetics, Inc. founded in California. Their primary product is
the IntelliGenetics Suite of programs for DNA and protein sequence
analysis. - 1981 - The Smith-Waterman algorithm for sequence alignment is published.
IBM introduces its Personal Computer to the market.
The concept of a sequence motif ( Doolittle )
- 1982 -
Genetics Computer Group (GCG) created as a part of the University of
Wisconsin of Wisconsin Biotechnology Center. The company's primary
product is The Wisconsin Suite of molecular biology tools.
GenBank Release 3 made public
Phage lambda genome sequenced
- 1983 - The Compact Disk (CD) is launched. Name servers are developed at the University of Wisconsin.
Sequence database searching algorithm ( Wilbur-Lipman )
LANL (Los Alamos National Laboratory) and LLNL (Lawrence Livermore
National Laboratory) begin production of DNA clone (cosmid) libraries
representing single chromosomes.
DNA analysis becomes viable with the discovery of Polymerase Chain
Reaction. It allows small samples of DNA to be multiplied to produce a
large enough sample to analyse - 1984 - Jon Postel's Domain Name System (DNS) is placed on-line.
The Macintosh is announced by Apple Computer.
- 1985 - The FASTP/FASTN algorithm is published.
Robert Sinsheimer holds meeting on human genome sequencing at University of California, Santa Cruz .
At OHER, Charles DeLisi and David A. Smith commission the first Santa
Fe conference to assess the feasibility of a Human Genome Initiative
1986
- Following the Santa Fe conference, DOE OHER announces Human Genome
Initiative. With $5.3 million, pilot projects begin at DOE national
laboratories to develop critical resources and technologies.
The
term "Genomics" appeared for the first time to describe the scientific
discipline of mapping, sequencing, and analyzing genes. The term was
coined by Thomas Roderick as a name for the new journal.
Amoco Technology Corporation acquires IntelliGenetics.
The SWISS-PROT database is created by the Department of Medical
Biochemistry of the University of Geneva and the European Molecular
Biology Laboratory (EMBL).
The PCR reaction is described by Kary Mullis and co-workers.
- 1987- The use of yeast artifical chromosomes (YAC) is described (David T. Burke, et. al., Science, 236: 806-812).
The physical map of e. coli is published (Y. Kohara, et. al., Cell 51: 319-337).
Perl (Practical Extraction Report Language) is released by Larry Wall.
Congressionally chartered DOE advisory committee, HERAC, recommends a
15-year, multidisciplinary, scientific, and technological undertaking
to map and sequence the human genome. DOE designates multidisciplinary
human genome centers.
NIH NIGMS begins funding of genome projects
- 1988 - National Center for Biotechnology Information (NCBI) created at NIH/NLM
EMBnet network for database distribution
The Human Genome Intiative is started (commission on Life Sciences,
National Research council. Mapping and sequencing the Human Genome,
National Academy Press: washington, D.C.), 1988.
The FASTA algorith for sequence comparison is published by Pearson and Lupman.
A new program, an Internet computer virus desined by a student, infects 6,000 military computers in the US.
Reports by congressional OTA and NAS NRC committees recommend concerted genome research program.
HUGO founded by scientists to coordinate efforts internationally
First annual Cold Spring Harbor Laboratory meeting on human genome mapping and sequencing.
DOE and NIH sign MOU outlining plans for cooperation on genome research.
Telomere (chromosome end) sequence having implications for aging and cancer research is identified at LANL
- 1989 - The genetics Computer Group (GCG) becomes a privatae company.
Oxford Molceular Group,Ltd.(OMG) founded, UK by Anthony Marchigton,
David Ricketts, James Hiddleston, Anthony Rees, and W.Graham Richards.
Primary products: Anaconds, Asp, Cameleon and others (molecular
modeling, drug design, protein design).
DNA STSs recommended to correlate diverse types of DNA clones.
DOE and NIH establish Joint ELSI Working Group
- 1990 - The BLAST program (Altschul,et.al.) is implemented.
Molecular applications group is founded in California by Michael Levitt
and Chris Lee. Their primary products are Look and SegMod which are
used for molecular modeling and protein deisign.
InforMax is founded in Bethesda, MD. The company's products address
sequence analysis, database and data management, searching, publication
graphics, clone construction, mapping and primer design.
DOE and NIH present joint 5-year U.S. HGP plan to Congress. The 15-year project formally begins.
Projects begun to mark gene sites on chromosome maps as sites of mRNA expression.
Research and development begun for efficient production of more stable, large-insert BACs
- 1991 - The research institute in Geneva (CERN) announces the creation of the protocols which make -up the World Wide Web.
The creation and use of expressed sequence tags (ESTs) is described.
Incyte Pharmaceuticals, a genomics company headquartered in Palo Alto California, is formed.
Myriad Genetics, Inc. is founded in Utah. The company's goal is to lead
in the discovery of major common human disease genes and their related
pathways. The company has discovered and sequenced, with its academic
collaborators, the following major genes: BRCA1, BRACA1 , CHD1, MMAC1,
MMSC1, MMSC2, CtIP, p16, p19 and MTS2.
Human chromosome mapping data repository, GDB, established
- 1992 -Low-resolution genetic linkage map of entire human genome published.
Guidelines for data release and resource sharing announced by DOE and NIH
- 1993 - Sanger Centre , Hinxton, UK .
CuraGen Corporation is formed in New Haven, CT.
Affymetrix begins independent operations in Santa Clara, California.
International IMAGE Consortium established to coordinate efficient mapping and sequencing of gene-representing cDNAs.
DOE-NIH ELSI Working Group's Task Force on Genetic and Insurance Information releases recommendations.
DOE and NIH revise 5-year goals [Science 262, 43-46 (Oct. 1, 1993)]
IOM releases U.S. HGP-funded report, "Assessing Genetic Risks."
LBNL implements novel transposon-mediated chromosome-sequencing system.
GRAIL sequence-interpretation service provides Internet access at ORNL
- 1994 - Netscape Communications Corporation founded and releases Naviagator, the commerical version of NCSA's Mozilla.
Gene Logic is formed in Maryland.
The PRINTS database of protein motifs is published by Attwood and Beck.
Oxford Molecular Group acquires IntelliGenetics.
EMBL European Bioinformatics Institute , Hinxton, UK.
Genetic-mapping 5-year goal achieved 1 year ahead of schedule.
Completion of second-generation DNA clone libraries representing each
human chromosome by LLNL and LBNL - 1995 - The Haemophilus influenzea genome (1.8) is sequenced ( Fleischmann et al. , Science 269 :496-512 (1995).
LANL and LLNL announce high-resolution physical maps of chromosome 16 and chromosome 19, respectively
The Mycoplasma genitalium genome is sequenced ( Fraser et. al. , Science 270 :397-403 (1995).
Moderate-resolution maps of chromosomes 3, 11, 12, and 22 maps published .
Physical map with over 15,000 STS markers published.
First (nonviral) whole genome sequenced (for the bacterium Haemophilus influenzae).
Sequence of smallest bacterium, Mycoplasma genitalium, completed;
provides a model of the minimum number of genes needed for independent
existence - 1996 - The genome for Saccharomyces cerevisiae (baker's yeadt, 12.1 Mb) is sequenced.
The prosite database is reported by Bairoch, et.al.
Methanococcus jannaschii genome sequenced; confirms existence of third major branch of life on earth.
DOE initiates 6 pilot projects on BAC end sequencing.
Saccharomyces cerevisiae (yeast) genome sequence completed by international consortium
Affymetrix produces the first commerical DNA chips.
Sequence of the human T-cell receptor region completed
- 1997 - The genome for E.coli (4.7 Mbp) is published.
Oxford Molecualr Group acquires the Genetics Computer Group.
LION bioscience AG founded as an intergrated genomics company with
strong focus on bioinformatics. The company is built from IP out of the
European Molecualr Biology Laboratory (EMBL), the European
Bioinformtics Institute (EBI), the GErman Cancer Research Center
(DKFZ), and the University of Heidelberg.
paradigm Genetics Inc., a company focussed on the application of
genomic technologies to enhance worldwide food and fiber production, is
founded in Research Triangle Park, NC.
deCode genetics publishes a paper that described the location of the
FET1 gene, which is responsible for familial essential tremor, on
chromosome 13 (Nature Genetics).
NIH NCHGR becomes National Human Genome Research Institute (NHGRI).
Second large-scale sequencing strategy meeting held in Bermuda
High-resolution physical maps of chromosomes X and 7 completed.
DOE-NIH Task Force on Genetic Testing releases final report and recommendations.
DOE forms Joint Genome Institute for implementing high-throughput
activities at DOE human genome centers, initially in sequencing and
functional genomics - 1998 - The genomes for Caenorhabitis elegans and baker's yeast are published.
The Swiss Institute of Bioinformatics is established as a non-profit foundation.
Craig Venter forms Celera in Rockville, Maryland.
PE Informatics was formed as a center of Excellence within PE
Biosystems. This center brings together and leverges the complementary
expertise of PE Nelson and Molecualr Informatics, to further complement
the genetic instrumention expertise of Applied Biosystems.
Inpharmatica, a new Genomics and Bioinformatics company, is established
by University College London, the Wolfson Institute for Biomedical
Research, five leading scientists from major British academic centres
and Unibio Limited.
GeneFormatics, a company dedicated to the analysis and predication of
protein structure and function, is formed in San Diego.
Molecualr Simulations Inc. is acquired by Pharmacopeia.
- 1999 - deCode genetics maps the gene linked to pre-eclampsia as a locus on chromosome 2p13.
First Human Chromosome Completely Sequenced! On December 1, researchers
in the Human Genome Project announced the complete sequencing of the
DNA making up human chromosome 22.
Joint Genome Institute sequencing facility opens in Walnut Creek, CA.
Major Drug Firms Create Public SNP Consortium.
HGP advances goal for obtaining a draft sequence of the entire human genome from 2001 to 2000
- 2000 - The genome for Pseudomonas aeruginosa (6.3 Mbp) is published.
The A.thaliana genome (100 Mb) is secquenced.
The D.melanogaster genome (180 Mb) is sequenced.
Pharmacopeia acquires Oxoford Molecular Group.
HGP leaders and President Clinton announce the completion of a "working draft" DNA sequence of the human genome.
International research consortium publishes chromosome 21 genome, the
smallest human chromosome and the second to be completely sequenced.
DOE researchers announce completion of chromosomes 5, 16, and 19 draft sequence.
International collaborators publish genome of fruit fly Drosophila melanogaster
- 2001 - The huam genome (3,000 Mbp) is published.
Human Chromosome 20 Finished - Chromosome 20 is the third chromosome
completely sequenced to the high quality specified by the Human Genome
Project
- 2002 - Structural Bioinformatics and GeneFormatics merge
An international sequencing consortium published the full genome
sequence of the common house mouse (2.5 Gb). Whitehead Institute
researcher Kerstin Lindblad-Toh is the lead author on the paper; her
institution lead the project and contributed about half of the
sequence. Washington University School of Medicine delivered about 30
percent of the sequence, and created the mouse BAC-based physical map.
The Wellcome Trust Sanger Institute in the UK was the third major
partner. Other institutes in the International Mouse Genome Sequencing
Consortium included the University of California at Santa Cruz, the
Institute for Systems Biology, and the University of Geneva.
Mouse Genome Sequencing Consortium publishes its draft sequence of mouse genome in the December 5, 2002, issue of Nature
International consortium led by the DOE Joint Genome Institute publishes draft sequence of Fugu rubripes.
- 2003 -Human Genome Project Completion, April 2003.
Human Chromosome 14 Finished - Chromosome 14 is the fourth chromosome to be completely sequenced
- 2004
- The draft genome sequence of the brown Norway laboratory rat, Rattus
norvegicus, was completed by the Rat Genome Sequencing project
Consortium.