Thursday, August 12, 2010


SPL=Structured Product Labeling. It is an XML-coded database format used by drug regulatory agencies and pharmaceutical companies around the world. Information about a prescription or nonprescription drug, approved by the regulatory agency, used to be written and printed on a piece of paper and stuffed into the cardboard box that contained the bottle. Now, however, this information is broken down into various sections/elements (drug name, pharmacological class, dosage form, strength, manufacturer, indication, side effects, contraindication, etc.) and coded in XML. Thanks to the XML structure, this information goes into a searchable database.

OK, wake up. I know this is boring stuff. It is boring to me too. I'm not a database analyst or administrator. Today I attended a class -- not my own choosing -- about how legally regulated drug information (ie, product labeling) is increasingly broken down into granular information and stored in these electronic bee hives. The databases allow people to conduct a variaty of analysis, mix and match, shake and bake. These databases can also chit and chat, and mate and mingle with other XML-based databases, such as patient information, medical records, and side effect reports. You can find a needle in a haystack, or see patterns in ocean waves.

Thus, a lecture on bioinformatics, which I had expcted to be a snoozer, turned out to be fascinating. I got a glimpse on how they think -- these people who meticulously break down things that appear to be whole into various degrees of smaller granularity. A drug is not just a drug, but can be defined by its chemical structure, its biological effect on one's body, the disease it treats, the side effects it causes, the type of patients who should take it, the type who should not, the dosage form (liquid or solid, pills or powder), other drugs with similar indications (class), and other criteria. One drug, many descriptors, many identities.

Midway through the lecture I started to sweat a little, as I suddenly grasped the vision of these indexers, these people who dream up all these labels and granules and endless connections between seemingly unrelated concepts. It was as if I was seeing the world, for the first time, through their eyes, which are entirely different from my own. The view became warped. I felt dizzy.

Because, because, because ... at the moment I realized that anything can be viewed through the lens of granularity and classification, including people. Each person can be identified by a variety of characteristics he or she possesses --- See? He or she, that is already one characteristic: sex. Of course, population census has already been doing this for a long time --- cataloging people by sex, race, ethnicity, age, biometics, health status, marital status, income, language, profession, etc., etc. But these are crude measures. One can keep digging, deeper and deeper. A person can be classified by his parents' characteristics (socioeconomic status, education level, smoking history, medical/mental history, parenting style, religious affiliation ...), by his own biological characteristics, dietary and lifestyle habits, location of residence, the kind of TV he watches, the amount of savings or debts he has, the characteristics of his spouse and friends, the Web site he visits, etc., etc.

Then these data can be organized and analyzed. At some point, when enough breadth and depth are reached, patterns will emerge. At some point, the patterns will become so clear that they are predictive. Behaviors, decisions, preferences, whims and tendencies, and trajectories will all become knowable. Given a child's internal biological and external social categorizations, his life course can be prophesized with rising accuracy: Who he will become, what he will do for a living, whether he will marry and have children, when he will die. Given the history of a nation/people/tribe and the current condition, their behaviors can be predicted: Who they will choose to run their country, what policies they will demand, whether they will go to war, how rich or poor they will become, whether they will go bonkers. To achieve the latter may require even less data and calculation than to pin down the fate of an individual.

Like predicting climate trends and weather patterns, someone will be able to reliable predict how certain humans will react to certain conditions/stimuli. The hold-up, I think, is not only collecting and analyzing a large amount of data over time in various conditions, but also to formulate the classifications/specifications/categories correctly. In predicting behavior, what are the meaningful characteristics? Age, sex, upbringing, intelligence, personality, peer groups, socioeconomic class, location of residence, parents' history, genes? We don't know very much there. Maybe aspects we assume to be critical are negligible. Maybe the key factors are few and unexpected. Yet I don't think this difficulty is insurmountable. The advertising/marketing sector has already achieved a great amount of success.

The same type of analysis is already being conducted in weather forecasting, traffic management, Google, marketing (commercial and political), the stock market, and many other areas of life. In fact, this type of analysis is already being applied to human behaviors. It is only a matter of time before we reach a meaningful size of predictive power, for the masses and for individuals.

