Friday, June 5, 2009

links to some protein analytical tools

TMHMM
peptidecutter
scanprosite
PFSCAN
miscellenous tools on protein analysis
HOGENOM
protparam
protscale

Saturday, May 16, 2009

Understanding all about Protein. ( an introductory approach)

Introduction

Proteins are the major machine through which living organisms performs most of their metabolic activities. Proteins are built up from amino acids. Amino acids can be likened to blocks of different shapes that are used to build a building of a specific designed (i.e. protein).

Since the blocks have different shape, any block can’t just be used to substitutes for another block, on the course of building up the protein to its right speciation. Since substitution of amino acid are not allowed, the absence of any amino acid in the amino acid pool on the course of protein synthesis results in incomplete and terminated synthesis.

There are 20 amino acids found naturally in plants and animal (See details on structure and other amino acids information) Plants can easily manufacture all then needed amino acid from their primary metabolite. Animals derive their amino acid partly from then diet and from direct synthesis in the body.

Still using previous illustration, the building (i.e. protein) is built by the engineering action of ribosome in the nucleus of the cell. These engineers follow strictly, the plan (blueprint) stated on the mRNA.

The tRNA assists the ribosome to transfer amino acids from the amino acid pools to the site of protein synthesis. The blue print for protein synthesis is carefully copied from the DNA during the transcription to build any protein an organism will ever need.

PROTEIN STRUCTURE

Protein usually has unique structure that enables them to perform their specific function. There are four levels of a protein structure.

  1. The primary structure (i.e. the amino acid sequence)
  2. The secondary structure (i.e. the alpha and Beta helices. and sheets)
  3. The tertiary structure (i.e. the domains and family)
  4. Quaternary structure (i.e. aggregate of the tertiary structure )

The Primary Structure

The primary structure of a protein specifies the sequence of amino acids that makes up the protein. Its the primary sequence that determines the overall secondary and tertiary structure of the protein and hence its function.

Representation of the Primary Sequence

Conventionally, proteins are named from the N-terminal to the C-terminal. Each amino acid that makes up the entire protein sequence are joined together with a dipeptide bond, which involves a reaction between the carboxylic acid group of one amino acid and the amino group of another amino acid.

Amino acids in a protein sequence can be named as a whole (e.g. Alanine) abbreviated (ala) or as a single lettering form (A) The frequent way of representing amino acid sequence in Bioinformatic database is the single lettering method.

understanding all about Protein( an introductory approach)

Introduction

Proteins are the major machine through which living organisms performs most of their metabolic activities. Proteins are built up from amino acids. Amino acids can be likened to blocks of different shapes that are used to build a building of a specific designed (i.e. protein).

Since the blocks have different shape, any block can’t just be used to substitutes for another block, on the course of building up the protein to its right speciation. Since substitution of amino acid are not allowed, the absence of any amino acid in the amino acid pool on the course of protein synthesis results in incomplete and terminated synthesis.

There are 20 amino acids found naturally in plants and animal (See details on structure and other amino acids information) Plants can easily manufacture all then needed amino acid from their primary metabolite. Animals derive their amino acid partly from then diet and from direct synthesis in the body.

Still using previous illustration, the building (i.e. protein) is built by the engineering action of ribosome in the nucleus of the cell. These engineers follow strictly, the plan (blueprint) stated on the mRNA.

The tRNA assists the ribosome to transfer amino acids from the amino acid pools to the site of protein synthesis. The blue print for protein synthesis is carefully copied from the DNA during the transcription to build any protein an organism will ever need.

PROTEIN STRUCTURE

Protein usually has unique structure that enables them to perform their specific function. There are four levels of a protein structure.

v The primary structure (i.e. the amino acid sequence)

v The secondary structure (i.e. the alpha and Beta helices. and sheets)

v The tertiary structure (i.e. the domains and family)

v Quaternary structure (i.e. aggregate of the tertiary structure )

The Primary Structure

The primary structure of a protein specifies the sequence of amino acids that makes up the protein. Its the primary sequence that determines the overall secondary and tertiary structure of the protein and hence its function.

Representation of the Primary Sequence

Conventionally, proteins are named from the N-terminal to the C-terminal. Each amino acid that makes up the entire protein sequence are joined together with a dipeptide bond, which involves a reaction between the carboxylic acid group of one amino acid and the amino group of another amino acid.

Amino acids in a protein sequence can be named as a whole (e.g. Alanine) abbreviated (ala) or as a single lettering form (A) The frequent way of representing amino acid sequence in Bioinformatic database is the single lettering method.

Review of the essentials of computational Biochemistry: importance and implications

Review of the essentials of computational Biochemistry: importance and implications.

The major Biomolecule studied in computational Biochemistry are the DNA and protein. A lot of work has been done to unravel the mysteries of transfer of genetic information from parents to offspring through the DNA, and the machineries through which this takes place, which also is the ultimate product of DNA metabolism (the protein)

Effective discussion of protein cannot be made without making mention of DNA from which it is derived.

The DNA is an organic macromolecule made up of deoxyribose sugar and phosphate backbone, with a cross linking arm of a nitrogenous base which can either be purine or pyrimidine. They are usually found in the body in a duplex intertwined state, which actually consist of two complimentary strands of DNA.

The important part of DNA that decides the kind of information it carries is the Nitrogenous base arm

The pairing of the double stranded DNA follows a particular pattern. There are four major nitrogenous bases that can be found in the human DNA; Adenine, (A) Guanine, (G) Cytosine (C) and thymine (T). The two purine bases A and G complementarily pairs with the two pyrimidine bases, thymine and cytosine respectively with a double and triple bond respectively to give a double stranded DNA.

DNA SEQUENCE

On the course of protein synthesis, the exact sequence and type of nitrogenous base in each single stranded DNA are preserved during the replication and transcription processes.

Importance of gene and chromosome in protein synthesis

A genetically normal human has 46 set of chromosome, which they inherit from their parents. Each chromosome actually is a long single stranded DNA that is extensively super coiled with each other.

Each chromosome is divided into hundreds of different section of DNA sequences that codes for different unique proteins. Each of these sectional divisions of DNA sequences is referred to as genes.

Each gene occupies a specific position in a chromosome in all organisms of the same species. The knowledge of the gene location of any particular gene or protein is very important for research studies that involve gene extraction or transgenic and genetic engineering studies.

DNA sequence mRNA, cDNA, coding DNA

DNA sequence actually refers to the sequence of the nitrogenous base on a single DNA strand. It is conventionally named or listed in a 51 31 direction. (51 and 31 refers to the position of phosphates attained to the sugar ring) this sequence is different from mRNA, cDNA and coding DNA.

DNA sequence contains the raw and unedited sequence of the nitrogenous bases present in the chromosome. Before a gene is used to manufacture any protein, the information of the gene present in the chromosome is transcribed to a messenger RNA. (mRNA)

The mRNA Belongs to a group of nucleic acid that contains a ribose sugar instead of a deoxyribose sugar as found in the DNA.

The mRNA sequence differs from the DNA sequence in that it contains Uracil instead of thymine as found in DNA.

The transcription of the information on the segment of chromosomal DNA of interest, to a mRNA before protein synthesis commences, is to preserve the integrity of the chromosomal DNA (coding DNA).

Introns and Exons

The newly synthesized mRNA strand contains the introns and the exons. The introns are those segments of the DNA sequence that does not code for any amino acid. Exons are the segments of the DNA sequence that codes for amino acid.

The newly transcribed mRNA is edited during which the introns are spliced off and the exons are found together. Some other signaling sequences are added before and behind the edited mRNA

CDNA

CDNA is a complimentary DNA obtained from the action of reverse transcriptase on mRNA. It is usually obtained by cloning mRNA in bacteria cells and subsequently purifying and sequencing the resultant DNA. The sequence of the cDNA is an exact complement of the mRNA from which it was synthesized. Unlike the mRNA, the cDNA has no additional sequence at its ends.

Every three-nucleotide codes for one amino acid. Each unit of the triple coding nucleotide sequence is referred to as a codon. The usual starting codon is AUG which codes for methionine, while the usual stop codons are UGA, UAA, UAG and they don’t code for any amino acid.

The mode of representing the nucleotide sequences (cDNA, coding e.t.c) in bioinformatic sites is their mRNA equivalent but differs in the content of the sequence. Also thymine is used in place of Uracil in both mRNA, cDNA and DNA sequence so essentially the starting codon is represented as ATG while the stop codon are TGA, TAA or TAG. Some site contains tools that can be used to reverse any particular sequence.

Polymorphism

No two individual have exactly the same set of DNA sequence. Variation exists. These variations that exists between different individual’s genomic constituents is referred to as polymorphism. Some regions of the human genomic sequence are more variable than others. The more constant part of the sequence are very similar in a larger percentage of the population.

In some rare cases, one nucleotide or nitrogenous base may differ from that of the larger population, such difference in one nucleotide is referred to as a Single Nucleotide Polymorphism (SNP). Some times these SNPs may result to some forms of genetic disease while in some cases, it may be “silent”.

Usually when there is a SNP, the different nucleotide, will alter the codon sequence, which will also affect the type of amino acid that is encoded by the codon. The alteration in the amino acid will also cause an alteration in the final three-dimensional structure and function of the protein.

Genetic Disease

Genetic diseases usually result from the malfunction or the non-functionality of one or more protein in the body. It could either be caused by environmental or congenital factors. Its occurrence can always be traced back to the gene from which the proteins were produced.

In most cases, polymorphic genes usually result in genetic disorders. Some cases of genetic disorders may results from environmental factors, such as the damage to DNA, as a result of exposure to mutagens.

Since faulty genes are the main cause of genetic disorders, its treatment also requires that the faulty gene is located and subjected to different genetic therapeutic techniques such as gene replacement therapies. Such work requires extensive knowledge of computational biochemistry.


COMPUTATIONAL BIOCHEMISTRY FOR BEGINNERS

About computational Biochemistry

Relevant Biotool lists common Biotools and site employed by a computational Biochemist

Reviews of the essentials of computational Biochemistry; importance and implication.

Understanding the use and the interpretations of the result of some Bioinformatic sites

Still required help

Customized web search

Computational protein designing learn more

About Computational Biochemistry

Computational Biochemistry is not limited to just DNA and protein sequences as widely thought; it covers more than that. It encompasses the various computational tools (software and online Applications) that is used to aid the study of biochemistry. The recent advancement in the computing world had facilitated the availability of a vast array of tools to the modem biochemist. Computational Biochemistry itself is a subset of a broader subject, the Bioinformatic whose area of coverage is not limited to only Biochemistry, but also include other Bioscientific field.

Importance’s

The importance of the knowledge of computational biochemistry to a modern biochemist can be likened to the importance of computer to the literate community. Virtually every aspect of Biochemistry has been touched and influenced by computation.

In recent times computing has found immense application in different fields including Biochemistry. The application of computation to solving biochemical problems is becoming widespread. These include the acquisition of scientific literatures, analysis of protein and DNA sequence, and structures, understanding protein-protein and protein-carbohydrate interaction and functions, analysis of graphical results, statistical analysis of data.

Understanding computational Biochemistry enables a biochemist to easily acquire important information and other online resource for both educational and research purpose.

Available tools and resources

Currently, there are lots of free resource available online to an enlightened biochemist, the knowledge of computational Biochemistry would enable a biochemist to effectively annex these resource to achieve set goals. Such free resources include Biomedical libraries, protein and DNA sequences databank, interactive metabolic pathway tools, protein-protein interaction tools. see more listings of these free resources at the free online Biotool list

Feature and characteristics of Bioinformatic sites

A common feature of the online tools and resources of Bioinformatic site is their extensive cross-referencing. Most of these sites contain links to other online resource that offers more relevant information on the current query. In addition many informational sites has special numbers that they use to represent each entry indexed in their databases. This number is often referred to as accession number or ID. While some site prefer to use a more standardized identity system such as the EC number of an enzyme or the gene name, others prefer to use their own unique identity system. These IDs is what is used in cross-referencing the different bioinformatic sites together.

How to use Bioinformatic sites

The use of Bioinformatic sites depends on the nature of its operation for informational databases, a search is conducted in the text box provided with a generally accepted name of the query one then has to scan through the list of output result to pinpoint a specific search term. It is important to note the ID that the site used to represent the search query. This ID would be used for subsequent search. Using the ID to search for any term specifically opens up the desired entry.

Using Bioinformatic sites that offers online application usually require that the user understands the syntax and operation of the site before he or she can use them effectively. These sites, instead of containing Accession numbers or ID may contain “result ID” example of such site includes BLAST, CLAWSTAL, domain finders

Problems and limitation

There might be some limitation to the accessibility and use of this Bioinformatic resources, the primary fact is that since new online applications and tools are springing up daily, it is practically impossible to be conversant with the use of all these sites. A more realistic approach is the mastery of those sites and application that provides the set of information and services you want. Being aware of the existence and capability of a broader array of sites can be very helpful.

The other less serious limitation is the fact that one has to be “online” to be able to access these resources except if one opts to purchase the software that contains the same function and information of a site of interest.

CONCLUSION

As discussed earlier, the list of free Biotools available to a researcher is growing on a daily basis. We are now becoming more aware of how ICT could affect the advancement of Biosciences. Not only has bioinformatics facilitated the ease with which Bioscientific information is obtained, it also has a network of “like-sites” where similar information or more information could be obtained from a particular search term.

In all, computational Biochemistry is still an emerging and rapidly growing field that would be the bedrock of all biochemical studies in the nearest future. Avail yourself the opportunity to be “Bioinformatic literate”.

Friday, February 27, 2009

A FUTURE WITH COMPUTATIONAL PROTEIN DESIGNING

A lot has been achieved with computer programming in the last two decades. Such feats include automation of most industrial processes and successful unmanned space exploration missions. If these much can be achieved at the binary level of programming, it’s possible to achieve even more when programming is done with 20 different units when given adequate attention and intellectual reasoning. Programming with the whole 20 amino acids gives an edge to that of computer programming since protein is a potentially tappable information-rich molecule. For instance, more than 30 million different information rich codes of polypeptides can be generated for a polypeptide sequence of 30 amino acids when constructed with the 20 amino acids as against 435 different codes that can be obtained for the same 30 units when done with 0s and 1s of computer programming.


The recent advances in computational biochemistry have made things lots easier coupled with the successful sequencing of the entire genome of man and those of other animals. The successful crystallization and elucidation of the three dimensional structure of more than 50 000 proteins provides us with sufficient basis to understand and predict the structure and functions of proteins from known sequences. Available to us also are lots of sophisticated sequence analytical programs such as Blast, Clawstal, Minimotive miners, motif scans etc., and simulation programs such as Chime, Folding@home etc. Some of these programs are available freely online. Information about DNA and protein sequences can be obtained freely from cross-referenced databases such as NCBI, EXPASY, UNIPROT, etc

Computational protein designing is the use of computer to generate chains of polypeptide sequence that would function predictably. A lot can be achieved with the computational designing of short polypeptide sequence. Since only few amino acid accounts for the catalytic function of most enzyme, it is possible to design a protein that would have two or more active sites where each site would become activated under certain conditions. The sequence of the structural part would then be designed in such a way that the final folded protein would have a 3D structure that would enable it to perform its predetermined function.

The computational designing of proteins finds potential applications in many field such as in industries; as in the designing of various analyte sensors, development of novel enzyme applicable in industrial processes. It may also be useful in pharmaceutics and medical field in the development of novel protein drugs. I envisaged a future where biochemist will design probably anything with the super informational macromolecule; Protein, by placing appropriate amino acids in a particular sequence. In the design of an analyte estimator for example, it may be possible to program a polypeptide chain to undergo monitorable conformational changes in the presence of an analyte of interest. Such conformational changes may be coupled to the proportional and reversible hydrolysis of a chromorphic group attached to the polypeptide sequence.

When designing the protein rules governing the folding pattern of the proteins should be bore in mind as well the detailed knowledge of the chemistry of the individual amino acids, their bonding distances and the interaction that exist between them. All these will influence the stability and the three dimensional structure of the finally designed protein.

The polypeptide may be synthesized chemically directly, alternatively, the corresponding DNA sequence may be chemically synthesized and then inserted into suitable vector which can then be used to amplify the expression of the designed protein. These syntheses may pose immediate difficulty since the present technology for the chemical synthesis of polypeptide and polynucleotide sequences are expensive and could only be used to synthesize short sequences at a time.

There is need for us, after being exposed to computational biochemistry to explore this emerging field. It is becoming more pertinent not only because we (Biochemist) are the only one better disposed to doing it, but more importantly because it gives us the opportunity to optimize the use of what we can freely access. We need not worry about the actual chemical synthesis for now since that can be taken care of by labs in technologically advanced country.

We may actually be incapacitated financially but not intellectually!

UF Umeoguaju

08067817864

ufumeoguaju@hotmail.co.uk

Posted by uf umeoguaju at 8:06 AM 0 comments