Nice question, how to learn something global in a NN approach.

Interesting detail from answer by Mostafa Deghani: include extra value into embedding to help it learn an (i?)df-like feature.

And @suzan mentions recent paper on Bert rediscovering the NLP pipeline.


