May 8, 2022
One of the best articles I've read in a while. Please continue posting more articles like this!
I have a question regarding Z from the first step. This is a matrix and not a vector right?
Also, in the figure titled "Dimensions for Attention Block". Is Z passed through each attention head or is it somehow split into 8 different parts and each part is passed through an attention head?