Yon don't ask a neurosurgeon how to build an house, just like you don't ask a plane pilot how to drill a tunnel. Expertise is localized. And the most important thing is that humans learn.
> And the most important thing is that humans learn.
Implementation detail that will be solved as the price of AI training decreases. Right now only inference is feasible at scale. Transformers are excellent here since they show great promise at 'one shot' learning meaning they can be 'trained' for the same cost as inference. Hence the sudden boom in AI . We finally have a taste of what could be should we be able to not only inference but also train models at scale.
Humans learn from seeing I don't think we are the stage of training models with videos / images dataset. We only reached the plateau with text dataset to train with.