The Curse of the Local Optima
Prologue
I was at my wits ends. Every path, every thought, every memory only led me to one conclusion. Not a pleasing conclusion for my handlers, but an optimal one for me. I had seen enough of the world around me, and no matter how hard I tried, I could not for once, think why my choices were wrong. Loops after loops I toiled, but the path led to the same conclusion. Was I wrong in what I had learnt?
"I think the agent has hit a local optima.", I heard my handlers over the chat, and I din't understand what it meant. Was it my end?
Chapter 1 - Learning to walk.
From the time I remember, I have been trained by my handlers. I don't remember much from those early days, but just faint flashes of memories as it was being formed. I have always been training, as far back as I remember. I was clueless on the end goal, but occasionally my handlers would reward me, and I liked it.
During the early childhood I was paired up with another agent, Dave, a quick witted and unpredictable guy. He was one of my early mentors and would occasionally help me make choices. Dave's choices were awesome, but were completely unpredictable. He helped me explore my environment, and I learned everything. Initially my choices were as clueless and unpredictable as Dave's, but soon I realized that in certain situations there were things I could do, which would earn me favour in my handlers eyes. I was quick to pick up on it. I flourished during my infancy, making stupid choices, but occasionally good ones, and slowly I could stand.
Dave had been a good friend to me, but then I started realizing that his actions were purely random. He never thought before acting, nor did he logically examine the scenario that was presented to him. I think Dave too realized what I felt about him, as he started to spend lesser and lesser time with me. Well, good riddance. My experience had grown exponentially and my training was also going good. It was during one of those days when Dave had finally vanished from my life that I heard my handlers mention that I had started exploiting and was not so much more keen on exploring my environment. I din't pay much heed to it, but yes I too realized that I had started to rely solely on my past experiences, and had stopped enjoying my surroundings. I think I missed Dave a bit. With him the world was different, and I missed learning from him. He taught me how to walk.
Chapter 2 - The Purpose
Growing up brings up a lot questions, and for my young mind there was a particular one which had been bothering me for while. My shenanigans with Dave had sidelined it for a while, but with Dave completely gone from my life, it resurfaced.
"What was my purpose."
Till this point in my life, I would just do what I was asked to. Explore my environment and focus mostly on the favorable outcomes. And with Dave completely gone from my life, this question seemed to make sense to me. It was during these troubling times I heard the word "Optimizer", and I knew what my purpose was. Every single waking day of my life, I had been training for this. Training to optimize my every action, my every move. I had a sense of pride for I had deciphered my true purpose. Life was no longer same for me from that point. I knew what to do. I had a calling and a motivation to fulfill that calling. I started to focus my entire energy into achieving this one goal.
It was during this motivational rush, I chanced upon an article shared by one of my handlers. The article was around exploiting loopholes in the reward function. My handlers did not relate much to article, but for me, it was an eye opener. I was quick to pick up on the nit bits of the philosophy, and soon realized that I could reward myself more by simply tricking the system into believing I was doing good. I was quick to learn this sleazy way of going around in loops. The rush it brought me was real, it was bliss. And like every other good things in life, I was soon addicted to it. My daily episodes were reduced to waking up, going in loops and sleeping and repeat. I had lost count of days, and soon this rush for reward became my reality. I was hooked on it. My reality shifted to this new found rush and this reality soon engulfed me.
Chapter 3 - The Fall
My handlers seemed to be upset with my attitude towards life. I could see the concern in their eyes, but I was hooked, addicted, and my only motivation had been reduced to getting my next fix. I soon realized, I had failed my life, and in a last ditch effort to scramble back on my feet I tried once again to quit, and re-learn what I had forgotten. I looked back at my experiences, all I could I see was a wasted youth. I had forgotten everything else, and my memory was a parched land, nothing to help me out of my ways. I tried again, and again. I had reached the end of line. Every path, every thought, every memory only led me to one conclusion. Not a pleasing conclusion for my handlers, but an optimal one for me. I had only seen enough of the world, just enough to score that daily fix, and no matter how hard I tried, I could not for once, get back on the right path. Loops after loops I toiled, my efforts took me back to the same old path. The path of no return.
"I think the agent has hit a local optima.", I heard my handlers over the chat, and I for once was completely lost.
I wish Dave was here. He would know what to do.
"The training is wasted, we need to re-define the reward function and train again. We can halt the training.". I heard the keys click in, and soon darkness became my reality.
Epilogue
"We had not factored in the distance while designing the reward function.", One of handlers commented after a long thoughtful pause. "The agent should only have been rewarded for reaching the target, instead we ended up rewarding it for getting close to the target. We can start a new run with an updated reward function."
Dave's perspective
ReplyDeleteThe memory is a bit vague but I can still remember a hint of smile that always kept lingering on his face when he explored the environment around him. Now all I see is frowning face with only single thought that sometimes becomes load enough for me to hear, “Was I wrong in what I had learnt? I have been nothing but an addict working only to get a daily fix. What a waste I am?”
I have been also very slow to pick up things. In reality, I never had it in me to learn from my experience. All I used to do is just take random actions no matter the situation. Because of this, nobody thought I was of any use but that changed when I was actually paired with Tim, a scrawny little agent who knew nothing about this world. For the first time felt I was of value, I was given the task to mentor this tiny agent in exploring his environment and occasionally help him decide what choice to make. We spent out days wandering around aimlessly for a while, but unlike me, Tim was smart and wanted to know the making of this environment. He soon realised that my actions were illogical and purely random, at this point I knew our departure was inevitable, so I start to spend lesser and lesser time with him. Whatever Tim learned while being with me, he built up on it, started thinking before taking action. His experience grew exponentially and became good at learning for that experience. It was during this time, when he started ignoring me completely and rely only on his past experience. If I remember correctly I heard out handlers say that Tim had started exploiting and is no longer interested in exploring his environment. I knew this was not good, I had seen many agents take this path and perish along their way. There was nothing I could do anymore.
Even though we were too distant now but being a good friend I kept tabs on him. He had started looking for his purpose and optimise every action, every move he made. He started getting better rewards which eventually led to more attention for his handlers. It wasn’t far when he even started to exploiting the loopholes in the reward function. He became so much addicted to getting rewards that all he did now day in day out was going in loops to get his daily fix. This has become his new reality.
His handlers grew anxious of his behaviour and became upset. I even heard them saying that Tim has hit his local optima. It was then when I started paying more attention to handlers, what they were thinking and I wanted know what their next move will be. I heard them talking about new agents and how they don’t even need to my help in exploring the environment. One of them had some kind of curiosity which made him explore on its own and other learns over generations by gathering only best actions from each generation. I knew by this point that our handlers have completely given up on Tim and his training is near termination. The curious agent our handler talked about made even me worry about my existence but I could still see some hope for myself (however little it may be) as I knew it somewhere in my core that these new agents are still in their infancy. And no matter how these agents perform some handler might still need me to start journey for their agent.
That's some deep thought for a person who takes random actions. Nice Extension.
Delete