Read: "The Prophets of the Thinking Machines: Artificial General Intelligence and Transhumanism: History, Theory and Pioneers; Past, Present and Future, currently >4600-4800?(11.1.2026) (4000 3300? ; 3050-3100? pages (7.8.2025) 1240 1600 >2400-2500? (2900+?) pages [6.2025] This work will continue with "Creating Thinking Machines". Visit SIGI-2025 with some works and volumes already published and check and join and help with the open projects, which are not stealth: the AGI infrastructure called "Vsy" or "Jack of All Trades" etc. Welcome to Artificial Mind, part of The Sacred Computer. I am always looking for friends, partners and collaborators to work with, interesting project and new fields and things to study, explore and create. Join my exploration* or invite me to yours! (...) *Versatile (Limitless) Explorer and (Self-)Improver - one of my alternative terms for AGI/Universal Thinking Machines. What does Twenkid means?

Wednesday, May 20, 2026

// // Leave a Comment

Twenkid - The Child of AGI - is Challenging the Grandfather of AI Yann Lecun

Yann Lecun cites a post, which is acknowledging that his ideas were correct etc. 

https://www.facebook.com/yann.lecun/posts/pfbid0oEkmbuzwdvNC6JWRDoRzaDAyLNtzPZzvvman5ob89Z1v5AYvuQdTrQEjFcGJ3958l?__cft__[0]=AZaoNFuahXF3_DoSg2wL8ID4WPPsHBRWvjU51k3Sa742m0aQX2r7nl0VWyDXlyXx5eBBbxc-OWgq18zIcllnPjdIm_HNuV1Mga9hvE-Nga9s6_vMV0SjQVFusWFQhxpNCbrhQX0HjKTjh5WwzRfqeSFqWjE6X0vy_kidQqxMcHABvF2r6roXC6OCa1EcctDCHZNSND1P9hmQT8Cl2AcxeawX&__tn__=%2CO%2CP-R

A  part of the concluding punch lines:

* It is "dime a dozen", but people decades older than me who *literaly* repeated and ripped-off my suggestions and observations decades later, got prized with billions to *waste* and I am not even mentioned. They did it even in my own country, where one Bulgarian-Canadian became an "architect" of an institute in Sofia, with statements which were a *20 years late rip-off* of the above-cited essay, which were sold as  "innovative" and ground-breaking :))), "for the first time in Eastern Europe" etc.

* As of "Dime a dozen"--> yes, or even "Five a dozen" -->
The current  "supercomputer" of my lab is called "PETAK I", where "Pet" means "5": from: 1. "Pentium" (historically the CPU and brand on which TUM was created), 2. The CPUs of all nodes: Core i5 (all old ones, 11-14 years old models, LOL); 3. Five nodes of the cluster (the initial full configuration) 4. A parody CPU-name from a science fiction work from 2004 from that theory ("Pentium 5") and 5. In Bulgarian it also means "5 cents"... LMAO


"""Tunisia.AI

 
Администратор
Експерт в групата на тема Изкуствен интелект и машинно обучение
 20 април в 21:34 
Yann LeCun may have been right about something important: next-token and next-pixel prediction are probably not the most efficient path to real world understanding.
For years, the industry has been scaling generative models under the assumption that bigger models, more data, and more compute would eventually produce deeper intelligence. LeCun has been arguing the opposite: predicting every word or every pixel forces models to spend huge amounts of compute on surface details instead of learning the underlying structure of reality.
That’s the core idea behind JEPA (Joint-Embedding Predictive Architecture): instead of reconstructing the world pixel by pixel, learn a compact latent representation and predict what happens next inside that space.
The problem is that these models have historically been unstable. They suffer from “representation collapse,” where the latent space becomes too simple to carry useful information unless you add complex training tricks, auxiliary losses, or frozen components.
A new paper, LeWorldModel (LeWM), shows a much cleaner approach. It trains end-to-end from raw pixels using only two losses: a next-embedding prediction loss and a Gaussian regularizer on the latent space. This drastically simplifies the training setup compared to prior approaches.
The efficiency gains are striking. The model has around 15 million parameters, trains on a single GPU in a few hours, and can plan up to 48× faster than larger foundation-model-based world models, while staying competitive on several 2D and 3D control tasks. Its latent space also appears to capture meaningful physical structure and can detect physically implausible events in controlled environments.
This doesn’t mean generative AI is a dead end. LLMs remain extremely powerful. But it does reinforce a key technical point: for world modeling and physical reasoning, predictive latent-space approaches may be far more compute-efficient than brute-force generation.
The real shift might be this: not models that generate everything, but models that understand enough of the world to predict what actually matters."""

Todor Arnaudov

This idea, together with the prediction and next-token prediction (but in multi-scale, multi-precision hierarchy of resolutions of causality-control and perception), was published and explained nearly 25 years ago in Theory of Universe and Mind and presented during the world's first university courses in AGI in 2010 and 2011. Y.Bengio also rediscovered it 2017-2018 (Consciousness prior) and his example is almost literary repetition of an introductory definition from a treatise published about 14 years earlier. The author was a teenager, LOL.

Yann LeCun:

@Todor Arnaudov as I pointed out on another platform, ideas are a dime a dozen. The hard part, for something like this, is to implement it and to make it work.

The whole idea of hierarchical representations and learning by prediction is very old.

But learning hierarchies of representations didn't really work until convolutional nets were shown to do it in the late 1980s and more forcefully in the early 2010s (this took a while).

===

Todor Arnaudov:


Hi, first thanks for your answer as I didn't expect this honor. I don't disagree that there were earlier "prophets", I recently published a hyperbook with a related name (nearly 5000 pages in total), where one of the intros in one of the sections with collectons of related, prior and later work is a citation from the Holy Bible:

"There is nothing new under the Sun"

Some of the prior work doesn't get enough credit and is unknown, even the "fellow AI  historian" Schmidhuber doesn't mention them, e.g. the Soviet lab of Bongard and his colleagues etc. (E.g. once I caught Chollet literary restating insights from the 1967 book "Проблема узнавания" - perhaps he didn't know; he also rediscovers definitions for general intelligence of mine, published in 2001 (he couldn't know about it) - see the link at the end and the reviews of the LLMs).

 The Bible is called "The Prophets of the Thinking Machines: Artificial General Intelligence & Transhumanism: History, Theory and  Pioneers; Past, Present and Future", SIGI-2025 - and yes, almost nobody will bother to even open it. :))

BTW, e.g. IMO your PhD student Marc’Aurelio Ranzato deserves more credit for his pioneering work in DL and his insights (which perhaps [are] ~ also yours) -- his work is credited in my historical collections here: https://twenkid.com/agi/Lazar_The_Prophets_of_the_Thinking_Machines_20-8-2025.pdf ~p.21.

I do agree that I had to push to implementations immediately (not your type of NNs though) and perhaps my claims would be accepted after I implement them all by myself (Or if I or somebody else had - 20 years ago with no collaborators or any funding, no mechanical Turks to labe a gazillion of data and computing iterations, compared to 20 years later and all the collected resources in all senses of the word: i.e. IMO the difficulty of the implementation is supposed to decrease and be "discounted" with time like in RL; an idea 25 or 50 years ago may end up more "valuable" than an implementation in the present - see generative AI and the final citation below)

* I know about your dismissive opinion about "ideas", e.g. your comments to Schmidhuber's recent challenge, that you also could find ideas in your unpublished notes or something etc. and I've listened to your answers to him since 2022, "The path towards autonomous AI..." - I remember you defended yourself with referring to Optimal Control etc.

However many works are proposals, theoretical etc. but still get recognized, while other prior ones - don't and are even "humiliated". Also the core novelty there in my reading of the paper was also matching the mentioned TUM (and too general, it was not an implementation too); in general it looked like another cognitive architecture, which were popular in the cognitive science and the AGI community decades earlier, perhaps I have to reread it.

* I understand that if you dismiss even the German, who is at a comparable status as yours or, say he has more ground to be believed that he is, then you (and almost anyone) wouldn't recognize the claimed "priority" or even just the "contribution" of some obscure "self-proclaimed" "crank" or the mentioned theory, no matter the evidence (maybe you wouldn't even bother to check any evidence or count it as "theory" or anything).

BTW, your recent work about the brain/humans as "not general ..." also matches and is closely related to my prior work/accounts, beginning in early 2000s, however with different interpretation of the observations. The limitations don't deny the concept of general intelligence and the possibility of general principles and modules (prediction-compression etc.) I may address the correspondences in a paper.


*  Stack Theory is yet another Fork of Theory of Universe and Mind, SIGI-2025

https://www.researchgate.net/publication/398934575_Stack_Theory_is_yet_another_Fork_of_Theory_of_Universe_and_Mind_-_Appendix_Volume_to_The_Prophets_of_the_Thinking_Machines_Artificial_General_Intelligence_and_Transhumanism_History_Theory_and_Pioneers 


* The first modern AI strategy was published by an 18-year old in 2003 and repeated and implemented by the whole world 15-20 years later: Bulgarian Prophecies: How would I invest one million for the greatest benefit for the development of my country? https://twenkid.com/agi/Purvata_Strategiya_UIR_AGI_2003_Arnaudov_SIGI-2025_31-3-2025.pdf (Bongard, 1967 vs Chollet,2024 p.169-170)


* BTW, cheers from Kyuchuk Paris - that's the district in the city of Plovdiv, where TUM was created. 🙂

* This is the world's first modern "AI strategy", 2003, repeated and implemented by "the whole world" 15-20 years later: https://twenkid.com/agi/proekt.htm 

* It is "dime a dozen", but people decades older than me who *literaly* repeated and ripped-off my suggestions and observations decades later, got prized with billions to *waste* and I am not even mentioned. They did it even in my own country, where one Bulgarian-Canadian became an "architect" of an institute in Sofia, with statements which were a *20 years late rip-off* of the above-cited essay, which were sold as  "innovative" and ground-breaking :))), "for the first time in Eastern Europe" etc.

* As of "Dime a dozen"--> yes, or even "Five a dozen" -->
The current  "supercomputer" of my lab is called "PETAK I", where "Pet" means "5": from: 1. "Pentium" (historically the CPU and brand on which TUM was created), 2. The CPUs of all nodes: Core i5 (all old ones, 11-14 years old models, LOL); 3. Five nodes of the cluster (the initial full configuration) 4. A parody CPU-name from a science fiction work from 2004 from that theory ("Pentium 5") and 5. In Bulgarian it also means "5 cents"... LMAO

 Also as I predicted in 2013 (counterintuitive to all "experts" up to just a few years ago, I namely wrote this article *because* of clueless "experts" predicted the opposite; they were later cited thousands of times for their *WRONG* world-model and wrong predictions):

"Creative Intelligence will be First Surpassed and Blown Away by the Thinking Machines, not the "low-skill" workers whose jobs require agile and quick physical motion and interactions with human-sized and human-shaped environment"

https://artificial-mind.blogspot.com/2013/10/creative-intelligence-will-be-first.html

" (...) For the intellectual jobs - it's much easier to pick a computer, run the appropriate software or connect it to the service,

and get it thinking - you already have decent cameras, microphones and many sensors even in smartphones. (...) The bottom line is that the "white collars" are more endangered in current-time economy. Perhaps that kind of economy could hardly survive the AGI revolution. I guess it may turn upside down for a while - the low-skill workers could get higher pay, because intellectual activities will be done in 1 ms for free... 😉  We, the smart guys (the smart asses, see "Super Smartasses" the graphical series ) wouldn't be needed by anyone... Not that we are needed now. :))"


 * The prediction of the generative AI (however it could have been created by the late 2000s-early 2010s - it came *too late*, not too quick as Hinton and Bengio "complain"; not with gradient-descent of course):  

 -- Creativity is Imitation at the Level of Algorithms - An outline sketch of a possible path of development of the Artificial Intelligence "Emil" 

https://www.researchgate.net/publication/395129890_Creativity_is_Imitation_at_the_Level_of_Algorithms_-_An_outline_sketch_of_a_possible_path_of_development_of_the_Artificial_Intelligence_Emil

 * Petak I: https://github.com/Twenkid/SIGI-2025/blob/main/petaki.md

Read More

Sunday, May 17, 2026

// // Leave a Comment

List of the Biggest Early GPT LLMs for Non-English Languages Circa 2020-2021-2022 - Update with Chinese, Russian and Spanish models

Updates to the table about the biggest GPT-models circa 2021 in the book "The First Modern Strategy ... " 

The GPT2-MEDIUM-BG seems to be among the biggest 6-7 models, trained on a free single Tesla T4 in Colab. :)) 

p.26

В този труд са дописани допълнителни бележки към цитирани откъси от класическия ТРИВ от [17] за мерките за зародиш на разум и степените на развитие. От[TT1]  големите езикови модели – сведения за тях и работата им и някои най-нови публикации, както и сравнение на данни за ранни GPT-модели на различни езици – арабски, френски и множество европейски, японски и китайски – като българският GPT2-MEDIUM-BG се оказва един от шест-седем най-големи модели от такъв тип в света до 2021 г. за езици различни от английския – по-големи около същото време или малко по-рано са само за китайски, арабски, руски, румънски и френски; с подобен размер е за японски[1], разработен по същото време като българския. Кратки  бележки за по-големия проект за инфраструктура за Общ ИИ и всякакви проекти, свързани и с пораждащи модели – Вседържец[TT2] “.(...)


[1] Възможно е и др.: на 18.5.2026 добавих китайски, руски и нов испански от 2022 г. Виж [236



p.238 [236]

236. Ранни пораждащи големи езикови модели от типа GPT за езици, различни от английския: български, френски, арабски, испански, португалски, немски, китайски; гръцки, сръбски, румънски, японски, китайски, руски – 2020-2021 г.  Датата на някои – по дати на файловете с теглата на модела, дата на научна статия и пр. До края на 2021 г. само китайският, френският, арабският, руският, румънският, японският и българският са с над 100-тина милиона параметъра. Румънският е силен, обучаван на 17 GB-ов корпус. Само българският вероятно е разработен от един-единствен човек с бюджет и подкрепа = 0 и авторът представя родната компютърна лингвистика в тази дисциплина като самозван „хайдутин“, понеже институциите и по-„елитните“ бойци чакаха до 2023-2024 г. [66]. Сравни с аналогичен случай с ДЗБЕ около 2001-2003 г. и бездействието на ИБЕ на БАН и на останалите филолози от университетите спрямо явленията, срещу които ДЗБЕ се противопоставяше и се опитваше да „призове“ „чети“ [16][40], а „маститите“ езиковеди (по определението на Павлин Стойчев, „PC World Bulgaria“, 5.2003 [239]) гледаха безучастно и обясняваха, че това били „естествени процеси“. Сравни с бележките за „Добродетелната дружина и нехранимайковците“ и [40], 2003 г., дали талантите не са имали избор да не учат в „най-престижните университети“ и да развият местните и пр. XLM-R от „Фейсбук“, 11.2019 е по-голям, но в него българският е един от 100 езика, на които е обучаван, и моделът е за класификация и отговаряне на въпроси, а не за пораждане.
Таблици: подредени по време на създаване и по размер:
Допълнена на 18.5.2026 с китайския, руския и испанския голям модел.

Ранни големи езикови модели “GPT“за разни езици по време

GPT

117 M

6.2018

GPT2

1.554 B

14.2.2019 (XL) (публик. 11.2019)

 

Италиански

117 М

4.2020

Португалски

124 M?

5.2020

Гръцки

124 M?

9.2020

Немски

124 M?

11.2020 – 8.2021

Китайски

124 M?

11.2020 – 5.2021

Испански

124 M?

12.2020

Китайски

2.6 B

12.2020

Арабски

1.46 B

3.2021

Руски

760 М

5.2021

Френски

1 B

5.2021

Румънски

774 M

7.2021

Сръбски

124 M?

7.2021

Български

355 М

6.2021 – 8.2021, Тош

Японски

336 М

16.8.2021

Японски

1 B

20.1.2022

Испански

774 M

1.4.2022

БАН

124 M

27.6.2023

INSAIT

7.3 B

2.2024

 

Ранни големи езикови модели тип “GPT“,
подредени по размер

Китайски

2.6 B

12.2020

Арабски

1.46 B

3.2021

Френски

B

5.2021

Руски

760 М

5.2021

Румънски

774 M

7.2021

Български

355 М

6.2021 – 8.2021, Тош

Японски

336 М

16.8.2021

Японски

1 B

20.1.2022

Испански

124 M?

12.2020

Португалски

124 M?

5.2020

Немски

124 M?

11.2020 – 8.2021

Италиански

117 М

4.2020

Китайски

124 M?

11.2020 – 5.2021

Гръцки

124 M?

9.2020

Сръбски

124 M?

7.2021

БАН

124 M

27.6.2023

INSAIT

7.3 B

2.2024

GPT

117 M

6.2018

GPT2

1.554 B

14.2.2019 (XL) (публик. 11.2019)

1. Тодор Арнаудов, GPT2-MEDIUM-BG, Свещеният сметач, ДЗБЕ ~6.2021 – 8.2021345М – български – обучен от нулата на Tesla T4 в Colab [31][46]

[31. Т.Арнаудов, Подготовка на набор данни и обучение на
GPT2-MEDIUM на български, 6.2021 г.: Train GPT2-MEDIUM Google Colab Tips & Tricks Any Language From Scratch
https://github.com/Twenkid/GPT2-Bulgarian-Training-Tips-and-Tools
https://github.com/Twenkid/GPT2-Bulgarian-Training-Tips-and-Tools/blob/main/bggpt_sacred_computer.ipynb
* T.Arnaudov, GPT2 Unlimited-Length Generation with Hidden Prompt Injections - Code Review, 2021 (1.2023)https://youtu.be/V1eO2OpsXBE
* T.Arnaudov, GPT2-Medium Training from Scratch on Colab for Any Language - Tips & Tricks by Twenkid,
2021: https://youtu.be/F-Xt-cK4L-g Код и подробна инструкция за подготовка на набор от данни и обучение на GPT2 модели безплатно в Google Colaboratory от нулата (на английски). Популярен клип по темата с над 4 хил. гледания, 68 харесвания, 30-тина абонати.]

46. T.Arnaudov, 2021 (2025), gpt2-medium-bg, https://huggingface.co/twenkid/gpt2-medium-bg    
* T.Arnaudov,
https://github.com/Twenkid/GPT2-Bulgarian-Training-Tips-and-Tools
* T.Arnaudov, Train GPT2-Medium in Google Colab – Tips & Tricks – Any language from scratch, 2021
https://youtu.be/F-Xt-cK4L-g
* T.Arnaudov, Unlimited length with GPT2 … Update 6-5-2023 …
* T.Arnaudov, Hidden Prompt Injections: Unlimited Length GPT2 Generation, 1. 2023 (work from 2021)
https://www.youtube.com/watch?v=V1eO2OpsXBE

]
 
2.
Antoine Simoulin, Benoit Crabbé. Un modèle Transformer Génératif Pré-entrainé pour le ______ français. Traitement Automatique des Langues Naturelles, 6.2021, Lille, France. pp.246-255. ffhal03265900f  https://hal.science/hal-03265900 : – френски GPTfr-124M и GPTfr-1B с архитектурата на GPT3. 5.2021
3.
https://huggingface.co/dbddv01/gpt2-french-small - друг френски малък SMALL 137M, също обучен в Colab като българския, но с платена услуга Colab Pro.
4. Wissam Antoun and
Fady Baly and Hazem HajjARAGPT2: Pre-Trained Transformer for Arabic Language Generation, 7.3.2021 https://arxiv.org/pdf/2012.15520
Арабски, 4 варианта: 135M, 370M, 792M, 1.46B
  
5.  https://huggingface.co/datificate/gpt2-small-spanish испански: SMALL 124M? 12.2020 (дообучен от английския, използва техники от португалския)
6.
https://huggingface.co/pierreguillou/gpt2-small-portuguese/tree/main - португалски, SMALL, 124M?, 5.2020
7.
https://github.com/stefan-it/german-gpt2 немски малък: SMALL : 11.2020 - 8.2021 (тората версия - преобучен с по-добри резултати, използващ dbmdz)
8.
https://huggingface.co/dbmdz/german-gpt2/tree/main - 8.2021  - 10.2021 SMALL 124M?
9. GePpeTto Carves Italian into a Language Model, Lorenzo De Mattei, Michele Cafagna, Felice Dell'Orletta, Malvina Nissim, Marco Guerini, 29.4.2020  – италиански SMALL 117M
10. Chinese GPT2 SMALL-like models:  https://huggingface.co/uer/gpt2-chinese-cluecorpussmall  SMALL  11.20 – 5.21  https://huggingface.co/ckiplab/gpt2-base-chinese
11. https://huggingface.co/nikokons/gpt2-greek - гръцки, малък, SMALL, 9.2020
12. https://huggingface.co/macedonizer/sr-gpt2 - сръбски, малък, SMALL, 25.7.2021
13. https://huggingface.co/readerbench/RoGPT2-medium румънски, 124M, 354M, 774M LARGE, 7.2021   https://huggingface.co/readerbench/RoGPT2-large/tree/main Много голям за тогава корпус: 17 GB и обстойни тестове за производителността в статията:
RoGPT2: Romanian GPT2 for Text Generatio
n, M.Niculescu, S.Ruseti, M.Dascalu, 11.2021, University Politehnica of Bucharest, 2021 IEEE 33rd International Conference on Tools with Artificial Intelligence (ICTAI)
https://www.researchgate.net/publication/357227566_RoGPT2_Romanian_GPT2_for_Text_Generation
14. Японски модел от типа на
GPT2-MEDIUM с 336 М, 24 слоя, 1024-размерни вектори.  https://huggingface.co/rinna/japanese-gpt2-medium  * https://github.com/rinnakk/japanese-pretrained-models  Японският е много добре обучен (development perplexity, ppl 18, обучен за 45 дни на 8xV100 32 GB върху японската Уикипедия и др.), файл от 16.8.2021 г. https://huggingface.co/rinna/japanese-gpt-1b - 1-милиарден модел, 20.1.2022 г., 24 слоя, 2048-размерен вектор.


15.
Unsupervised Cross-lingual Representation Learning at Scale, Alexis Conneau, Kartikay Khandelwal, …, Veselin Stoyanov, 11.2019/4.2020  - XLM-R многоезичен езиков модел, обучаван и върху български корпус. В разработката участва Веселин Стоянов.

[Updates in the edition from 17.5.2026: Chinese, Russian, Spanish]
16. CPM: A Large-scale Generative Chinese Pre-trained Language Model, Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun, 1.12.2020, https://arxiv.org/abs/2012.00413

17. Methods for Detoxification of Texts for the Russian Language Daryna Dementieva‡ , Daniil Moskovskiy‡ , Varvara Logacheva‡ , David Dale‡ , Olga Kozlova† , Nikita Semenov† , and Alexander Panchenko‡ ‡Skolkovo Institute of Science and Technology, Moscow, Russia †Mobile TeleSystems (MTS), Moscow, Russia {daryna.dementieva, daniil.moskovskiy, v.logacheva, d.dale, a.panchenko}@skoltech.ru {oskozlo9,nikita.semenov}@mts.ru,  19.5.2021 https://arxiv.org/pdf/2105.09052 https://github.com/ai-forever/ru-gpts

18. Spanish Language Models
, Asier Gutiérrez-Fandiño, Jordi Armengol-Estapé, Marc Pàmies, Joan Llop-Palao, Joaquín Silveira-Ocampo, Casimiro Pio Carrino, Aitor Gonzalez-Agirre, Carme Armentano-Oller, Carlos Rodriguez-Penagos, Marta Villegas; 15.7.2021 – 5.4.2022 (v1 to v5); the GPT models appears in v3 from 1.4.2022
https://arxiv.org/abs/2107.07253v3

* GPT2 е обявен от OpenAI през 2.2019 г., но не е публикуван за използване заради опасения за възможна злоупотреба – пораждане на „фалшиви новини“ и пр. През 8.2019 пускат 774М, а през 11.2019 – двойно по-големият. OpenGPT2, представен през 8.2019 г., е обучен върху корпуса „OpenWebText”; цената за облачни услуги била около 50 хил. долара. https://en.wikipedia.org/wiki/GPT-2
* Размерите са приблизителни и може да са неточни за модели, които не са ползвали точно архитектурата, някои са с различен брой токени (32000) и пр. Влияние оказва не само броят параметри, а още качеството на данните и начинът на обучение и др. На онзи етап и мащаби всички модели са експериментални и с научна и образователна цел.




[Submitted on 1 Dec 2020]

CPM: A Large-scale Generative Chinese Pre-trained Language Model


during training: batch size = 3,072;  3M tokens .. (vs 1 M for GPT3 training)
strong few shot learning ...

* Methods for Detoxification of Texts for the Russian Language Daryna Dementieva‡ , Daniil Moskovskiy‡ , Varvara Logacheva‡ , David Dale‡ , Olga Kozlova† , Nikita Semenov† , and Alexander Panchenko‡ ‡Skolkovo Institute of Science and Technology, Moscow, Russia †Mobile TeleSystems (MTS), Moscow, Russia {daryna.dementieva, daniil.moskovskiy, v.logacheva, d.dale, a.panchenko}@skoltech.ru {oskozlo9,nikita.semenov}@mts.ru,  19.5.2021

https://arxiv.org/pdf/2105.09052

https://github.com/ai-forever/ru-gpts


* Others: later than GPT2-MEDIUM-BG (2021)


Spanish - MarIA GPT-2 -

https://arxiv.org/pdf/2107.07253v1 - only BERT-like model in July 2021

GPT2 models, up to Large 774M appears in v3, 1.4.2022:

https://arxiv.org/abs/2107.07253v3 

...

German -  https://www.kkirchheim.de/blog/german-gpt/ -  "Training a German LLM from scratch"

Existing German models available on Hugging Face have 137M parameters and a context length of 1024 tokens1, which is quite limited compared to recently released models, such as those in the LLAMA family.


Read More