I was wondering if you could point me to a resource that would explain
the 'Human Memeome Project'.
I'll second that. And to further Fredrick C. Multon's comment:
I also think excluding commercial and adult sites would be counter
productive.

I don't think so. Since sex/reproduction is the prime motivator for
vast majority of humanity, leaving it out is stupid, puritanical and
disingenuous.

It *doesn't* exist, I was making a proposal to create the "Human Memeome Project". Also, If you don't exclude adult sites, your data mining effort will conclude that the most popular of all human memes is "free nude teen pics". The next 100 down the list will be similar. There is plenty of adult information in personal web sites and I believe that to be more representative of a human individual than people looking to make a quick buck. Those sites represent someone's effort to influence others to spend, not someone's effort to publish their personal memes on the web.

You are basically wanting to catalogue an index for a super-search

No, lets say you wanted to collect a snapshot of a specific class of memes e.g. (idioms/buzz words/catch phrases/figures of speech). Since most of these phrases are between three and six words, you could start searching for multiple matches from the greater web of every group of three to six words you encounter on the web site that is currently being scanned. Certainly, you would need a super-search engine as a tool to accomplish this. But the output would be a list of catch phrases that are common in the meme pool.

