Logic of Discovery - The happy accidents

“One of the advantages of being disorganized is that one is always having surprising discoveries.” A.A. Milne

 

 Let me start with a question. 

How do discoveries happen? 

Are they an outcome of meticulously planned path of perseverance and hard work, or are they just happy accidents. Many would agree on the first point, and there would be a handful who would argue that its the latter. Then there would be a collection of individuals who would say, its a mixture of both. I for one would fall into the last category, who believes its a mixture of both.


Lets set the premise by coming up with a procedural definition for discovery. Discovery for one, has the unique property of unveiling something which was previously unknown. A unique question which this property of discovery raises is - If it was unknown, how did the discover know what they were looking for? Since the discoverer did not know what they were looking for, the discovery in itself, is more of an accident, and less of a logically thought out path, where the discoverer puts in conscious effort to unveil the unknown. A counter argument to this accidental aspect of discovery would be the question around hypothesized discovery. In this aspect the discoverer has a hypothesis of the unknown, a very vague idea in form of a hypothesis. The discoverer then works towards proving or disproving the hypothesis, and in that process discovers or fails to discover the unknown. This aspect of the process of discovery is a well thought out algorithmic approach. The discoverer starts by stating the hypothesis, and then works towards proving the hypothesis by following set procedures. If this premise is true, then why there does not exist a model for discovery, which can be automated.

With both the arguments, we see that discoveries are not purely accidental, and not purely algorithmic, but instead is a healthy mixture of both.

To support this argument, I would take the help of automated learning mechanism, specifically reinforcement learning (RL). I know that I would be walking on thin ice with the statement about automating discoveries, but for the arguments sake, I would rather want to take the RL agent as a dumb agent, trying to discover an optimum action sequence. Every RL agent starts out without the actual knowledge of the said end purpose, but again the argument can be made that the human agent who designed the reward function for the RL knows what is being optimized. This argument would slide out if I try to argue that we humans are also trying to discover what has already been designed in by the nature and are trying to find optimum sequences of actions to achieve the reward set forth by nature.

Taking the basic premise of discovery: unveiling of something previously unknown - The RL agent unveils a previously unknown optimization function, hence it would be safe to assume that the RL agent is in fact discovering something. The secondary aspect of the argument related to proving or disproving of a hypothesis, also fits the given premise, as the RL agent, has formed an informal hypothesis about the existence of this optimization function. Hence with the earlier stated arguments around discovery, our 'RL agent' hypothesis exhibits characters of discovery.

Since we have established that the RL agent in fact is performing a discovery, we can now come to the final point of the argument - Is discovery an algorithmic process or a happy accident. In the RL training process, the RL agent goes through the stages of exploration and exploitation. The exploration stage is where the RL agent explores the given environment by performing some sub-optimal actions which are a combination of predicted actions mixed in with noise or random actions. In the exploitation stage, the RL agent uses what it has previously experienced to perform actions which would lead to an optimal solution. The RL agent collects enough experience in form of the random noise, which it later on uses to form the basis of its learning. Without these random actions, the RL agent would mostly not converge to an optimal solution. Hence the combined efforts of both the exploration phase and the exploitation phase helps an RL agent to discover the optimum action sequences. 

The reward function though, plays a huge role in the final performance of the RL agent, but for simplicity sake, we can assume that the reward function is perfect.

Using the above analogy, I would now like to divide the entire discovery process into two sections, first the learning phase, and the second the application phase, which are synonymous to the exploration and exploitation stages respectively. Its during the learning phase that we tend to make the intuitive connects via accidental discoveries, which are in the form of random actions, or perceived pseudo definite actions. These pseudo definite actions help us in not getting caught in a sub-optimal discovery which may arise due to our 'exploitation' phase of discovery, where we apply the knowledge in the form of a well established algorithmic fashion.

As an example to this happy accident, I would refer to the discovery of the crystal structure of a retro-viral protease from Mason-Pfizer monkey virus (M-PMV), a monkey virus which causes HIV/AIDS-like symptoms, a scientific problem that had been unsolved for 15 years. It was accomplished by players tinkering around with 3D protein structures in a gaming platform called Foldit. While the puzzle was available for three weeks on the platform, players produced a 3D model of the enzyme in only ten days that is accurate enough for molecular replacement. These players without a formal domain training were able to solve a really puzzling problem through a combination of their pseudo definite actions and learned experiences.

I hope that I have done justice to the title of discussion. Feel free to refute or argue this point further.

 

References:

Comments

Popular posts from this blog

Perceptron Networks - 1: Did I Win?

The Curse of the Local Optima

Dying Technologies: IRC