What does it mean to be a great example? The size of the model – the trained neural network – is measured by the number of components involved. This is the content of the network that is constantly updated during training and then used to generate model predictions. In short, the more complex the model, the more information that can be incorporated into its research, as well as its predictions about the new data will be.
GPT-3 has 175 billion units – 10 times more than its predecessor, GPT-2. But the GPT-3 is smaller than the 2021 class. Jurassic-1, the largest commercial language launched by US startup AI21 Labs in September, on the edge of the GPT-3 with 178 billion shares. Gopher, a new version released by DeepMind in December, owns 280 billion shares. Megatron-Turing NLG is worth 530 billion. The Google Switch-Transformer and GLaM models have a share of 1.2 trillion respectively.
What is happening is not limited to the US. This year Chinese tech giant Huawei has launched a $ 200 billion platform called PanGu. Inspur, a Chinese company, produced 1.0 billion yuan, a sample of 245 billion. Baidu and Peng Cheng Laboratory, a research organization in Shenzhen, have launched PCL-BAIDU Wenxin, a model with 280 billion units that Baidu is already using for a variety of purposes, including online research, news feeds, and smart speakers. And Beijing Academy of AI announced Wu Dao 2.0, which has 1.75 trillion shares.
To date, South Korean online research company Naver has announced the HyperCLOVA brand, with a total of 204 billion shares.
Each of these is a well-known professional work. To begin with, teaching a model with more than 100 billion segments is a major pipeline problem: hundreds of individual GPUs – training tools for neural networks – need to be connected and interconnected, and the division of training data must be in chunks. he distributed among them in the proper order at the proper time.
Major languages have become well-known works that demonstrate the potential of the industry. Yet a few of these new models promote further research beyond repetition showing that enhancement leads to better outcomes.
There are several innovations. When trained, Google Switch-Transformer and GLaM use a small fraction of their predictive components, thus saving computer power. PCL-Baidu Wenxin incorporates the GPT-3 version with a knowledge graph, a method used in symbolic old-school AI to preserve the facts. And besides Gopher, DeepMind was released Image of RETRO, a language model with only 7 billion sections competing with others multiplying 25 times its size by looking through the text archive as it creates text. This makes RETRO cheaper in training than its competitors.