Tag Archives: NIH

BLyS Sequence Analysis

I’ve been playing with some sequence analysis and phylogentic tree construction programs recently because I would like to introduce these sorts of data analysis into my biology classes. As a sample protein, I decided to use BLyS / BAFF, a protein important in regulating B Cell numbers. I’ve always wondered about the origin of this kind of molecule, since working on it in grad school, and this seemed like a decent way to get some ideas about where it might come from.

The first thing I did was go to the NIH’s National Library of Medicine website:

It’s easy to search for any protein / gene / whole genome you are interested in examining. Knowing that BLyS is vital in humans and mice, I chose to start with the human sequence. I retrieved it as the following:

>gi|20196464|dbj|BAB90856.1| BLyS [Homo sapiens]

The easiest tool to find similar proteins in other animals is the Basic Local Alignment Search Tool for proteins, or BLASTp. Just using default settings, I pasted the sequence in the search field and hit go. (note, I actually just used the accession number, not the whole sequence)


This retrieved tons of proteins with similar sequences from the vast database of sequence information, from which I chose several model species. One thing I wanted to do was to include several primates as a sort of internal calibration (assuming that they would all have very similar sequences compared to more distantly related species). I also wanted to get a few animals’ sequences who are quite distantly related to humans (frog and ground tit fir that bill)

Once I had a list, I put them all into a single text file and then used that in a second program. This time, I decided that the best ‘multiple alignment tool’ would be CLUSTALX. It’s been around for a while and can create data in a number of different forms. Besides, it’s free and versions are available for both mac and PC.

Again, for starters, I just accepted the default parameters and did a quick alignment:


Obviously, there’s something odd about the canid familiars (dog) sequence, but before I did anything about that, I just wanted to see what a phylogenetic tree looked like. This is another thing that Clustal does well, it will export your sequence alignment as tree data in a number of formats, then I could plug that data into one final program. This last is a web based program that I access through a french site (but you can probably find it in a number of places). The program is called DRAWGRAM. It accepts alignment data and outputs a graphical tree representation of the alignment.

This is an important logical step… What I’m doing is asking for a family tree of sorts to be displayed that represents the relationship of the sequences I provided. We might want to assume that this also tells us how related the organisms that have these proteins are – and that’s not wrong, but it’s also not thorough as we’re only using ONE protein to make that assumption.

Here’s my first tree:


Note how isolated Canis is on this representation.

Finally, I went back and truncated the Canis sequence to a place where I suspect the protein actually starts – my sequence from the NCBI gave me a string of Amino Acids at the front of the protein that I think are probably not there, but just got added by some computer algorithm without proper human oversight.

Once I did that Canis (by the way, I remained the sequence ‘DOG’ so I was sure it was the new one) fell in line with a sequence more similar to that seen in cats (felis):

ImageThat’s it for now. Although I expect that I will dig a little deeper with more animals to see if I can come closer to an ‘original BLyS’.


  1. Dereeper A., Audic S., Claverie J.M., Blanc G. BLAST-EXPLORER helps you building datasets for phylogenetic analysis. BMC Evol Biol. 2010 Jan 12;10:8. (PubMed)
  2. Dereeper A.*, Guignon V.*, Blanc G., Audic S., Buffet S., Chevenet F., Dufayard J.F., Guindon S., Lefort V., Lescot M., Claverie J.M., Gascuel O. robust phylogenetic analysis for the non-specialist. Nucleic Acids Res. 2008 Jul 1;36(Web Server issue):W465-9. Epub 2008 Apr 19. (PubMed) *: joint first authors
  3. Felsenstein J. PHYLIP – Phylogeny Inference Package (Version 3.2). 1989, Cladistics 5: 164-166
  4. Larkin,M.A., Blackshields, G., Brown, N.P., Chenna, R., McGettigan, P.A., McWilliam, H., Valentin, F., Wallace, I.M., Wilm, A., Lopez, R., Thompson, J.D., Gibson, T.J., Higgins, D.G. (2007) Clustal W and Clustal X version 2.0. Bioinformatics, 23:2947-2948.
  5. Thompson,J.D., Gibson,T.J., Plewniak,F., Jeanmougin,F. and Higgins,D.G. (1997) The ClustalX windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Research, 25:4876-4882.
Leave a comment

Posted by on March 7, 2014 in Uncategorized


Tags: , , , , , , , , , , , , , ,

The Human Genome… genes on chromosomes

I was spending some time on stack exchange’s biology section the other day, when I saw an interesting question that someone had about how genes are arranged on chromosomes.

In answering his question, I picked up a couple of screen shots and links that I thought I should share here.

The query was included the following (paraphrased):

How are genes  arranged on the chromosome, are they were all in a single direction and how does the cell ‘know’ which direction they are in?

The best way to approach this question is to take advantage of the amazing amount of resources compiled at the NIH’s National Library of Medicine…

One fun place to start is the Genome Page, which looks like this:



Note the 22+ X and Y chromosomes on the lefthand side of the page. Each chromosome is clickable and will take you to a chromosome page that looks like this:


Map view of H. sapiens Chromosome 14

Genes are listed on the right side of this map with locations of each indicated through a set of nested maps on the left. Each gene is clickable, providing links to the research done supporting these map placements and functions of the gene/protein. You can also easily use this information to jump to the homologous gene found in any of a number of fully sequenced organisms.

Below the map of the chromosome is a legend that indicates additional information and shows how much detail that each of the maps you are observing provides.


The amount of data is overwhelming, but you can adjust how much detail is shown in order to get the ‘lay of the land’ for a specific chromosome without getting too lost. If you have a gene you want to find, you can also pinpoint it this way and see what other genes are located nearby (and therefor ‘linked’ to your gene).


huMMR gene, chromosome 10

I searched for the Human Macrophage Mannose Receptor (a protein I made antibodies against when I worked for Medarex). This gene is located on chromosome 10, as indicated by the red dots. 212 references provide sequence information about this gene and protein.

If you keep going down the rabbit hole, you can see each of the DNA sequences that were used to identify and locate this gene on the chromosome (I omitted providing an illustration of this page because it is hard to get anything from it if shrunk down of prevented piecemeal. However, you can go to this page by following this link).

Finally, you are given the links to the complete coding sequence (cds), which has the actual sequence of the gene and protein as well as notes about how it is put together. In my mind, these are the bread and butter of this site, and probably the oldest reference pages that have provided gene hunters data for several decades now. 


Ahh, data I can use!!



A slice of sequence info

It’s easy to see this as way too much information to be useful (hence the problem of ‘Big Data’ in Biology), but it’s also extremely cool, and I have to admit that I’ve gotten just as lost in tracing the data on genes using this site as I did walking from topic to topic in the Encyclopedia when I was a kid.

So… to answer the questions posed above, you can use this site to see that many genes lie in different direction along the chromosome. Why the cell doesn’t get ‘confused’ is because the cell doesn’t try to arrange data like we do in volumes of books meant to be read in order. Each gene is regulated, transcribed and translated according to its own local rules, as if ‘unaware’ of all that’s going on around it.


Leave a comment

Posted by on September 8, 2013 in Uncategorized


Tags: , , , , , , , , , ,