reinforcement learning

Reinforcement learning systems do not neessarily have intuition about which set of actions will bring it closer to the goal state

Humans can intuit which set of actions will bring it closer to the goal state

Therefore in learned tasks, humans take the shortcut granted by system 1, kind of like a database with an intelligent cache that has rapid, low compute recall, due to a fuzzy search like mechanism that we use to index

Now, is it because we are processing the entire for loop of possibilities and knowing that 99.999% of the thousands of possible approaches will not work? I think it is more likely that system 1 is being invoked to aid system 2 to refine the set of options to be limited to those with the greatest ‘heuristic scores’ (AI terminology is probably weird for neuroscience, idk) associated

Therefore the vastness in data acquired over years and stored away into long term memory and recalled by system 1 is what makes us able to learn faster, without that, it’s likely learning new things would take longer

Conclusions — an LTM system is required to improve any heuristic driven system in order to select a subset of high probability of maximum heuristic scenarios without having to test them all with expensive system 2-like compute power, the downside being the likelihood of local minimas, but humans are also prone to this error (bat and ball = $1.10 problem)

Inspiration for this thought:

I was thinking about why RL systems like that openAI hide and seek sim take so long to mature, yet I can learn how to do something in minutes and with very few examples? The skill I attempted to learn was throwing my phone and catching it by the rear camera every time — I went from 2/10 successes to 8/10 successes after maybe 40–50 flips and less than 3–4 minutes. What caused my improvement was the inherent intuition about gravity, spin rate, start position, my dexterity, the visual sensory input I had about the closeness of the phone’s camera to my hand. I tried again with my eyes closed, and in less than 2–3 minutes or 20–30 flips reached an 8/10 success rate, though my understanding was that I was primarily then driven by the same underlying factors, but more so by timing.

Inspiration for this thought:

Footer