انت هنا الان : شبكة جامعة بابل > موقع الكلية > نظام التعليم الالكتروني > مشاهدة المحاضرة

Secondary DATABASES

Share |
الكلية كلية التمريض     القسم قسم العلوم الطبية الاساسية     المرحلة 2
أستاذ المادة عماد هادي حميد الطائي       11/07/2018 07:26:59
Secondary DATABASES
Secondary databases are curated, non-redundant databases that are derived from the primary (archival) databases.
For example, the NCBI RefSeq database is a secondary database that is a collection of curated, non-redundant, well-annotated sequences including genomic DNA, transcripts, and proteins.
The RefSeq database also provides a lot of other information about these sequences, such as characterization, mutation, polymorphism analysis, expression studies, and comparative analyses.

An Example of a Non-Redundant, Curated Secondary Database of Proteins—the Swiss-Prot
Swiss-Prot is now a part of the larger database system called the Universal Protein Resource Knowledgebase (UniProtKB), which was initiated in 2002 by the UniProt consortium. The UniProtKB consists of two parts: UniProtKB/Swiss-Prot (reviewed, manually annotated) and UniProtKB/TrEMBL (unreviewed, automatically annotated; TrEMBL5translated EMBL).
UniProtKB/Swiss-Prot contains manually annotated records and information obtained from the literature and curator-evaluated computational analysis, whereas UniProtKB/TrEMBL contains computationally analyzed records that still need full manual annotation.
The Swiss-Prot database, which is widely used for sequence and other information on proteins.







Secondary and specialized databases that are publicly available
Universal Protein Resource
Knowledgebase (UniProtKB) The UniProt Knowledgebase (UniProtKB) is the central repository for the collection of sequence and functional information on proteins with accurate, consistent, and rich annotation.

Worldwide Protein Data Bank
(wwPDB) Experimentally determined structures of proteins, and complex assemblies.

Structural Classification of
Proteins (SCOP) database The SCOP database aims to provide a detailed and comprehensive description of the structural and evolutionary relationships between all proteins whose structure is known, including all entries in the PDB.

Class, Architecture, Topology,
Homology (CATH) database CATH is a manually curated classification of protein domain structures. Each protein is chopped into structural domains and assigned into homologous superfamilies (groups of domains that are related by evolution).

PROSITE database This consists of a large collection of biologically meaningful signature patterns or profiles. These signatures are not easily revealed by standard sequence alignment.

PRINTS database This is a compendium of protein fingerprints; a fingerprint is a group of conserved motifs used to characterize a protein family.



Protein Family (Pfam) database Pfam is a comprehensive database of protein families; members of a family share significant similarity, thereby suggesting homology. Pfam allows the analysis of sequence data in order to search for related proteins in the database based on domains.

InterPro database InterPro integrates various predictive protein signatures from diverse source repositories, such as Gene3D, PANTHER, Pfam, PIRSF, PRINTS, ProDom, PROSITE, SMART, SUPERFAMILY, and TIGRFAMs.

Biological General Repository for Interaction Datasets (BioGRID) The BioGRID database is an online repository of interactions in which data are curated from both high-throughput data sets and individual focused studies, as derived from over 40,000 publications in the primary literature.

Molecular Interaction database
(MINT) MINT is a public repository for protein-protein interactions reported in peer-reviewed journals. It focuses on experimentally verified protein-protein interactions.

IntAct IntAct is a freely available, open source molecular interaction database populated by data either curated from the literature or from direct data depositions.

Structural Database of Allergenic
Proteins (SDAP) SDAP is a web server that integrates a database of allergenic proteins with various computational tools that can assist structural biology studies related to allergens, including predicting the IgE-binding potential of food proteins.

Allermatch database The Allermatch database allows the comparison of a protein sequence with sequences of allergenic proteins in the database, in order to predict whether the protein being evaluated can be allergenic.

Online Mendelian Inheritance in Man (OMIM) database OMIM is a comprehensive compendium of human genes and genetic-disease-associated phenotypes.

ArrayExpress database A public database of microarray gene-expression data at the EBI. It accepts data generated by sequencing or array-based technologies and currently contains data from almost a million assays, from over 30,000 experiments.

Gene Expression Omnibus (GEO) database The GEO is a public repository that archives and freely distributes MIAME-compliant microarray data, next-generation sequencing data, and other forms of high-throughput functional genomic data submitted by the scientific community.

ArrayTrack database A public database of microarray gene-expression data at the US Food and Drug Administration.

Comparative Toxicogenomic database (CTD) This is a public database of information built on curated data from the scientific literature about interactions between environmental chemicals and gene products and their relationships to diseases.





Chemical Effects in Biological Systems (CEBS) database The CEBS database has been developed by the National Center for Toxicogenomics within the National Institute for Environmental Health Sciences (NIEHS). CEBS integrates data obtained using ’omics technologies (transcriptomics, proteomics, metabolomics) as well as from traditional toxicology studies.

DrugMatrix database DrugMatrix is a toxicogenomic and molecular toxicology database and informatics system developed by the National Toxicology Program (NTP).

FlyBase database FlyBase is the leading database and web portal for genetic and genomic information focusing on Drosophila melanogaster, but also including data on other Drosophila species and related drosophilids.

NCBI databases Collection of various databases.














Sequence Identity
Sequence Similarity
Sequence Homology

Sequence Identity
Means the same residues being present at corresponding positions in two sequences being compared. For proteins, it means the same amino acids; for nucleic acids, it means the same bases.

Sequence similarity
Means similar residues being present at corresponding positions in the two sequences being compared. For nucleic acids, sequence similarity and sequence identity are the same. However, for proteins, sequence similarity involves amino acids with similar physicochemical and functional properties. For example, substitution of lysine and arginine by one another will be regarded as similar substitution because both are positively charged hydrophilic amino acids. Likewise, substitution of aspartic acid and glutamic acid by one another will be regarded as similar substitution because both are negatively charged hydrophilic amino acids. Substitution of asparagine by aspartic acid and substitution of glutamine by glutamic acid, or vice versa, are also regarded as similar substitutions. Substitution of isoleucine, leucine, and valine by one another will be regarded as similar substitutions because they have similar aliphatic hydrophobic side chains. Substitution of serine and threonine by one another is also regarded as similar substitution. Similar substitutions are also referred to as conservative substitutions.

Sequence homology
Is an evolutionary term that has been misused the most in the literature to denote sequence similarity or identity. Sequences are called homologous if they have a common evolutionary origin—that is, if they are derived from a common ancestral sequence. So, sequences are either homologous or not homologous and there is no quantitation of homology. However, even now, expressions like “high homology,” “significant homology,” and even specifying a “% homology” are very widely used. Such usage has no reference to the evolutionary underpinning of the term homology. The root of the term homology goes back to the early evolutionary literature, where organs having similar structure and anatomical origin but performing different functions (hence morphologically different) were called homologous organs.




Restriction-Site Mapping of the Input Sequence
RNA Secondary-Structure Prediction
Microarray Analysis
Detection of Sequence Polymorphism and the SNP Database


المادة المعروضة اعلاه هي مدخل الى المحاضرة المرفوعة بواسطة استاذ(ة) المادة . وقد تبدو لك غير متكاملة . حيث يضع استاذ المادة في بعض الاحيان فقط الجزء الاول من المحاضرة من اجل الاطلاع على ما ستقوم بتحميله لاحقا . في نظام التعليم الالكتروني نوفر هذه الخدمة لكي نبقيك على اطلاع حول محتوى الملف الذي ستقوم بتحميله .
الرجوع الى لوحة التحكم