Re: How many genes do we differ on?

From: Robert J. Bradbury (bradbury@aeiveos.com)
Date: Wed Mar 14 2001 - 20:09:16 MST


Lord, I leave the list for a few months and the response
quality falls into the basement.... :-;

Now, while Hal and Curt attempted to respond to Robin's
question, I don't really think they did.

Robin wrote:
> "The last common ancestor of mice and men probably lived 100m
> years ago. Yet according to Dr Venter, the firm's scientists have
> found only 300 genes that people have and mice do not."
>
> Anyone seen the answer to this question: On how many
> genes do two typical humans differ?

To start with the question isn't phrased well (for technical
resons I will get to). Given the statements that seem to be
motivating it, I think Robin may be asking:
  "Do humans have different numbers of genes?"
If that is the case, then my answer would be very very rarely.
All mammals have essentially the same "gene set". As pointed
out by Venter, only a few hundred are gained or lost between
humans and mice. Most of what has been going on in the evolution
of higher animals is a lot of chromsome breaks where the genes
get put back together differently. There is also some evolution
driven by retrotransposons moving genes or gene regions around.

The technical problem with Robin's query is that it begs
the question of what a "gene" is. The simple 1-gene = 1-protein
model is pretty much dead and buried for higher organisms,
particularly mammals. We may only have 30,000 genes but we
probably produce 140,000+ mRNAs due to alternate splicing
and that may result in hundreds of thousands of proteins
due to post-translational modifications.

The answers regarding polymorphisms do provide interesting
data that you can use to study natural rates of mutation and
evolution and degrees of relatedness between individuals,
populations and species. However they don't tell you
very much about the protein differences that lead to
phenotype differences in humans. The single nucleotide
polymorphisms (SNPs) where a single base is changed that
are outside of the gene (and its associated regulatory
regions) probably have no effect. Those in genes that
change bases (A/C/G/T) where the DNA code is redundant
are not going to change the protein produced (and so
are probably irrelevant as well). Even if you change
the amino-acid at that point in the protein, you would have
to change it to one with a radically different characteristics
(+ charged to - charged, hydrophobic to hydrophilic, etc.)
to have a mutation that seriously effects the protein produced.
So I don't think at this stage (without a *much* better
understanding of every protein that a single DNA "gene"
sequence produces) you can use SNP data to answer Robin's
question. The protein mobility data does a better job
because it is dealing with the physical entities that
produce the phenotype of a body. However, it too
may be complicated because a change in a single
gene involved in protein glycosylation or other
post-translational modifications could cause changes
in the mobility of *many* proteins. So you can only
use this data to answer Robin's question if you make
sure the post-translational machinery is identical in
the individuals being tested.

It could also be true, that there could exist individuals
with different DNA in a gene that functions in the
post-translational pathway that leads to different
post-translational modifications in the proteins
(so molecularly the individuals appear quite "distinct").
However, if all of the other genes in these individuals
were "identical", then the two individuals might be
virtually equivalent at the observed phenotype level.
Such a difference might only show up if the two individuals
were challenged with a microorganism or toxin that would
interact with the protein modifications of one individual
and not the other. Such a micro-organism or toxin might
not even exist in the world.

There are ~10,000 known human genetic diseases. For most of these
we know that mutations in a single gene are the cause of the
disease. So individuals with Cystic Fibrosis may be
phenotypically similar to each other (they have mutations
that give them the same disease) but show much different
severities in the disease. This is because hundreds of different
mutations have been mapped to the CF gene which cause a greater
or lesser loss of function of that protein. It isn't clear
to me whether Robin would consider these "different".

Though I don't have a reference, I believe I've seen data
that ~1/3 of the genes are developmental lethals. I.e. any
"significant" mutations in them and you don't get a functional
organism. So presumably 1/3 of the genes in humans are
essentially equivalent.

Given all of the above I think one can only say that humans
may differ on up to around 20,000 genes. To answer the
question in more detail is going to require more exact
definitions of the terms "genes", "typical" and "differ".

Robert



This archive was generated by hypermail 2b30 : Mon May 28 2001 - 09:59:40 MDT