Re: How many genes do we differ on?

Date: Fri Feb 16 2001 - 20:02:48 MST writes:
> I don't have exact numbers to hand, but Futuyama cites large mammals
> as being heterozygous (having different genes) at 3.7% of allozymes. That
> means 3.7% of the time your two alleles produce proteins of significantly
> different motility. Since most variants are rare, two people would differ
> at about 7% of loci; with 30,000 loci now estimated for the human population
> that's a floor of 2,100 genes. It will be higher as most changes aren't
> electrophoretically detectable.

>From one of the Nature papers [1] I find:

   The allele frequencies of a set of SNPs have been evaluated in
   independent populations using pooled resequencing. Samples of TSC (n =
   502) and overlap SNPs (n = 774) were studied in population samples
   of European, African American and Chinese descent, revealing 82%
   to be polymorphic in at least one ethnic group at frequencies above
   the detection threshold of pooled resequencing (10%). The remaining
   18% presumably represent SNPs with a frequency less than 10% in
   the populations surveyed and false positives. Furthermore, 77% of
   SNPs had a minor allele frequency of more than 20% in at least one
   population, and 27% had an allele frequency higher than 20% in all
   three ethnic groups. TSC and overlap SNPs had similar distributions
   across the populations, showing that they are comparable in quality
   and frequency. The high proportion of SNPs with significant population
   frequency is expected after SNP discovery in two or a few chromosomes,
   given standard assumptions about human population history.

As I read this, 77% of SNPs had the less popular variation in at least 20%
of the individuals in at least one of the three ethnic groups, European,
African-American, and Chinese. This implies that in the group that
experience the most variation at this SNP, the chance of two randomly
chosen individuals varying is at least 32% (2*.8*.2). 27% of SNPs had
this level of variation in all three groups.

I can't quite work out the math from all this, but broadly speaking
it looks to me like 10-20% of SNPs are likely to actually vary in two
randomly chosen individuals from the population of the world.

The paper further estimates about 2 SNPs per gene, for a total of about
60000 SNPs in expressed DNA. Combining this with my previous estimate
we get on the order of 10000 genetic differences in two randomly chosen

Another article [2] puts it this way:

   However, the real importance of SNPs is that there are so many of
   them. One estimate is that comparing two human DNA sequences results
   in a SNP every 1,000-2,000 nucleotides. That may not sound like much
   until you realize that there are 3.2 billion nucleotides in the human
   genome, which translates into 1.6 million-3.2 million SNPs. And that's
   just from comparing two sequences - the total number of SNPs in humans
   is obviously much more.

So here we have the 3 million SNP figure again, this time in the context
of two (presumably) randomly chosen individuals. However most of these
will be in non-coding regions. I can't find a clear statement of what
percentage of the genome represents coding regions, but I find "a few
percent" mentioned here and there. That would be about 100,000 SNPs
between two individuals.

These two figures are not very consistent, and neither corresponds too
well with Curt's estimate of 2100. It's possible that the second paper
was wrong to say that the 3 million SNP value was from two individuals,
but rather it may have been the collective data from a larger set. If
we then multiply by the 10-20% estimate from the first paper then the
two figures would be pretty close at about 10000.



This archive was generated by hypermail 2b30 : Mon May 28 2001 - 09:56:44 MDT