Introduced by Google in 2017, transformer models are a type of neural networks that use attention for learning dependencies between input and output. They are mainly used in computer vision (CV) and natural language processing (NLP).
They use a self-attention mechanism which processes entire input data at once, weighing different parts of input differently. This allows them to focus on most relevant parts of input thus enabling accurate and fast learning.
Examples include Bidirectional Encoder Representations from Transformers (BERT) and Generalized Pre-trained Transformers (GPT).
They have offered significant improvements over earlier models like Convolutional Neural Networks(CNNs) or Recurrent Neural Networks(RNNs) which use recurrence or convolutions to learn these dependencies.
Commentaires