BERT Model
This post is a walkthrough of the BERT model developed by Google. I decided to document some of the details of BERT during my Machine Learning Research at Berkeley AI Research.
The complete visual guide with reference to the weight matrices, the attention module can be found here: Click Here
Some intuition building slides are down below.