--

One of the best articles I've read in a while. Please continue posting more articles like this!

I have a question regarding Z from the first step. This is a matrix and not a vector right?

Also, in the figure titled "Dimensions for Attention Block". Is Z passed through each attention head or is it somehow split into 8 different parts and each part is passed through an attention head?

--

--

Nour Islam Mokhtari
Nour Islam Mokhtari

Written by Nour Islam Mokhtari

Machine Learning and Computer Vision Engineer | Newsletter for ML practitioners at: newsletter.aifee.co

No responses yet