Re: GENOMICS: Protein Function [was: Re: *sox18* in the bag]

From: Dan Fabulich (daniel.fabulich@yale.edu)
Date: Wed Mar 29 2000 - 02:25:53 MST


'What is your name?' 'Robert Bradbury.' 'Do you deny having written the
following?':

> 2) Computer modeling of the protein shape, based on first principles,
> known shapes of protein "motifs" (that have been copied throughout
> the genomes), and expanded computer power (e.g. IBM's Blue Gene)
> will converge to give highly accurate models for those proteins
> that for which crystals are difficult, and even those for which
> they are easy.

This is the area of the project which seems the most promising. However,
notice that just as soon as we "finish" the human genome project, we need
to start it all over again to look for errors. (And, believe me, we HAVE
made mistakes in the recording of all this information, many of which may
have significant consequences.)

> 3) We will rapidly determine genes involved in specific pathways (through
> evolutionary conservation across organisms), the regulatory sequences
> for genes (which must be conserved across organisms as well) and the
> pathways in which they function (through differential expression studies).
> Knowing what genes stay together and play together, combined with
> computer analysis of the 3-D substructures that identify conserved
> regions, catalytic sites, phosphorylated amino acids, etc. will
> allow rapid assignment of functions.

Here's where I have to differ. The cell is a messy and complicated beast.
It's not just full of other proteins, but other products of enzymatic
reactions. And metals. And carbohydrates.

Taking a quick glance at the carbohydrates near the membrane of a cell,
you might notice that they have a quite complicated structure. Right now,
as far as we know, these very complicated carbohydrates, by and large,
don't do anything, except in some cases sit there and make the neighboring
region just a little bit more hydrophilic. (Emphasis on the "by and
large," since some of them are quite interesting indeed.)

My biomedical engineering professor last semester (who's getting on in his
years) had some cutting remarks on this point. See, there was a time,
back in the earlier part of this century, when people were just starting
to take a close look at the nucleic acids floating about in the cell. They
had an interesting and complicated structure, but, as far as anyone could
tell at the time, they didn't appear to do anything. The cell bio
textbooks of his day claimed that DNA was probably just there to make the
cell nucleus more acidic.

This suggests to my professor that we don't know quite as much about
molecular biology as we might wish we did, and that much more research
will be needed before we'll even be able to send in the powerful computers
like Blue Gene to figure out what's going on there in very much detail.

> So, I think over the next ten years people will be quite surprised
> at how fast things will move.

For what it's worth, the biomedical community is extremely conservative in
its estimates, by and large; IMO, this is because no one who has worked in
biology for a long time has ever seen real rapid Growth in the field,
certainly not the rate computer science has Grown, though the biomedical
community HAS heard promises of rapid growth in the past. (I bet nobody
remembers the day when the synthetic chemists were making the sort of
promises that nanotech people are making today. On some level, nanotech
is just a fulfillment of that earlier vision, but O how long it has been
in coming!)

The people who have looked at DNA from a computer science perspective see
this as the sort of problem that will fall under the same sort of rapid
growth as the rest of computer science, since they see microbiology as a
sort of wet Turing machine, whose most basic rules have all been basically
figured out, and which now need only a better meta-description, in terms
of enzymes and products instead of in terms of DNA and proteins. CS
people look at this project like trying to reverse engineer a compiler.

The analogy isn't strictly wrong, but a whole lot of details have been
hidden when you look at the picture in a way that can be reasonably
described in CS terms. Like carbohydrates. Like all the other gunk
floating around in the cytoplasm. Like organometallic chemistry.

Indeed, I predict that the real work will be in coming up with a useful
description of the chemistry involved, ideally one that we can understand,
but if not, maybe just one that some future Blue computer can think about.
Something that can think about the chemistry of very complicated molecules
in the sort of terms that synthetic chemists think of our molecules today,
only much much better.

We might not see this model before Singularity. In fact, as I write this,
I suspect that we *probably* won't see this before Singularity.

Oh, well. Back to work.

-Dan

      -unless you love someone-
    -nothing else makes any sense-
           e.e. cummings



This archive was generated by hypermail 2b29 : Thu Jul 27 2000 - 14:06:42 MDT