Saturday, June 13, 2026
A few volumes from The Prophets of the Thinking Machines: History, Theory and Pioneers; Past, Present and Future
Thursday, May 28, 2026
How much Bulgarian is the so called Bulgarian institute INSAIT? Part I
The so called "Bulgarian" international Swiss-Bulgarian-World Institute for AI and computer science INSAIT, which "strategically" was in many ways a cynical rip-off and a steal of my project published in 2003 :))), with arrogant and ridiculous claims for being "innovative", "visionary" or original, LOL (however "hyperscaled" in its financial demands and expenses), are promoting their success at CVPR 2026 and their role for "Bulgaria": 17 publications + 3 other participations, 20 total. They are among the greatest in the world. [See more in the book "The First modern AI strategy .." (Първата съвременна стратегия за развитие чрез изкуствен интелект ..." and "AI Institutes ... " (Институти и стратегии за изкуствен интелект на световно ниво ...) from the hyperbook "The Prophets of the Thinking Machines..." from SIGI-2025)
Right, they are among the leading institutes in their and in this game, however are they BULGARIAN as they so aggressively emphasize and claim on every occasion? (Maybe that is why they do.)
Asking the free ChatGPT (no login) with an excerpt from the list of publications: ...
==================================================
QUESTION: How many papers? How many of them include Bulgarian or Slavic names?
[see the full text below the answers]
ANSWER: There are 18 papers listed.
Papers that appear to include Bulgarian or other Slavic names (based on author names alone) are approximately:
- ConceptPose: Training-Free Zero-Shot Object Pose Estimation using Concept Vectors
– includes Dani Velikova (likely Bulgarian/Slavic surname) - SPEAR-1: Scaling Beyond Robot Demonstrations via 3D Understanding
– includes Nikolay Nikolov and Aleksandar Yanev (clearly Bulgarian names) - FireScope: Wildfire Risk Raster Prediction With a Chain-of-Thought Oracle
– includes Mario Markov (Bulgarian/Slavic surname)
So the count is:
- 18 total papers
- About 3 papers with clearly Bulgarian/Slavic author names
If you want, I can also break it down into:
- specifically Bulgarian names,
- broader South Slavic / East Slavic / West Slavic names,
- or estimate the nationality distribution of all authors.
This can only be a rough estimate from names and affiliations conventions, not an actual nationality census. Many researchers work internationally, have multicultural backgrounds, or use romanized names that overlap across regions.
That said, from the author lists, the distribution appears heavily concentrated in Chinese researchers, with a secondary cluster from European institutions (especially Swiss/Western European labs connected to ETH Zurich / KU Leuven style collaborations around Luc Van Gool and Danda Pani Paudel).
Approximate estimate across all listed authors:
| Estimated background/nationality cluster | Approx. share | Notes |
|---|---|---|
| Chinese | ~70–75% | Dominant throughout nearly every paper |
| South Asian / Nepali / Indian | ~5–8% | Mainly from names like Danda Pani Paudel, Mohammad Mahdi |
| Western/Central European | ~8–10% | Swiss, Germanic, Dutch, Belgian, etc. |
| Balkan / Slavic (incl. Bulgarian) | ~4–6% | Nikolov, Yanev, Markov, Velikova |
| Spanish / Catalan / Latin European | ~3–5% | Vazquez-Corral, Serrano-Lozano, Rocafort |
| Korean | ~1–2% | Kailun Yang is Chinese despite surname “Yang”; not many obvious Korean names |
| Japanese | ~0–1% | Essentially none obvious |
| Middle Eastern / Turkish | ~1–2% | Ahmet Berke Gokmen |
| Other / unclear | remainder | Mixed international researchers |
A more concrete estimate by author count:
- Total unique authors listed: roughly 90–100
- Estimated:
- Chinese-origin names: ~70
- European-origin names: ~12–15
- South Asian-origin names: ~5–7
- Bulgarian/Slavic names: ~4–6
- Spanish-origin names: ~3–4
- Turkish-origin names: ~1–2
The most internationally connected recurring senior collaborators are:
- Luc Van Gool
- Danda Pani Paudel
- Kailun Yang
- Jinjin Gu
The overall pattern is very typical of modern CVPR:
- strong Chinese participation,
- Swiss/European lab hubs,
- globally mixed PhD/postdoc collaborations,
- and increasing cross-institution research teams
==========================
The Content of First Prompt:
* Yuanfan Zheng, Kunyu Peng, Xu Zheng, Kailun Yang
Seeing Beyond: Extrapolative Domain Adaptive Panoramic Segmentation
In: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026)
Paper
Xiaolong Qian, Qi Jiang, Lei Sun, Zongxi Yu, Kailun Yang, Peixuan Wu, Jiacheng Zhou, Yao Gao, Yaoguang Ma, Ming-Hsuan Yang, Kaiwei Wang
Learning Latent Transmission and Glare Maps for Lens Veiling Glare Removal
In: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026)
Paper
Shaolin Su, Josep M. Rocafort, David Serrano-Lozano, Lei Sun, Danna Xue, Javier Vazquez-Corral
Bridging the Perception Gap in Image Super-Resolution Evaluation
In: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026)
Paper
Jiaqi Tan, Xu Zheng, Yang Liu
RMMSS: Towards Advanced Robust Multi-Modal Semantic Segmentation with Hybrid Prototype Distillation and Feature Selection
In: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026)
Paper
Haoyu Chen, Keda Tao, Yizao Wang, Xinlei Wang, Lei Zhu, Jinjin Gu
Intelligent Photo Retouching with Language Model-Based Artist Agents
In: The Findings track of IEEE/CVF Conference on Computer Vision and Pattern Recognition 2026 (CVPR 2026 Findings)
Paper
Xiaoye Wang, Chen Tang, Xiangyu Yue, Wei-Hong Li
3D-Aware Multi-Task Learning with Cross-View Correlations for Dense Scene Understanding
In: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026)
Paper
Zhiyuan You, Ke Wang, He Zhang, Xin Cai, Jinjin Gu, Tianfan Xue, Chao Dong, Zhoutong Zhang
PhotoFramer: Multi-modal Image Composition Instruction
In: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026)
Paper
Xiang Yin, Jinfan Hu, Zhiyuan You, Kainan Yan, Yu Tang, Chao Dong, Jinjin Gu
How Far Have We Gone in Generative Image Restoration? A Study on Its Capability, Limitations and Evaluation Practices
In: The Findings track of IEEE/CVF Conference on Computer Vision and Pattern Recognition 2026 (CVPR 2026 Findings)
Paper
Zihao Dongfang, Xu Zheng, Ziqiao Weng, Yuanhuiyi Lyu, Danda Pani Paudel, Luc Van Gool, Kailun Yang, Xuming Hu
Are Multimodal Large Language Models Ready for Omnidirectional Spatial Reasoning?
In: The Findings track of IEEE/CVF Conference on Computer Vision and Pattern Recognition 2026 (CVPR 2026 Findings)
Paper
Bingwen Zhu, Bingwen_Zhu, Yuqian Fu, Qiaole Dong, Guolei Sun, Tianwen Qian, Yuzheng Wu, Danda Pani Paudel, Yanwei Fu, Xiangyang Xue
EgoSound: Benchmarking Sound Understanding in Egocentric Videos
In: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026)
Paper
Liming Kuang, Dani Velikova, Mahdi Saleh, Jan-Nico Zaech, Danda Pani Paudel, Benjamin Busam
ConceptPose: Training-Free Zero-Shot Object Pose Estimation using Concept Vectors
In: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026)
Paper
Ahmet Berke Gokmen, Ajad Chhatkuli, Luc Van Gool, Danda Pani Paudel
Inferring Compositional 4D Scenes without Ever Seeing One
In: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026)
Paper
Website
Code
Nikolay Nikolov, Giuliano Albanese, Sombit Dey, Aleksandar Yanev, Luc Van Gool, Jan-Nico Zaech, Danda Pani Paudel
SPEAR-1: Scaling Beyond Robot Demonstrations via 3D Understanding
In: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026)
Paper
Website
Jiancheng Pan, Runze Wang, Tianwen Qian, Mohammad Mahdi, Yanwei Fu, Xiangyang Xue, Xiaomeng Huang, Luc Van Gool, Danda Pani Paudel, Yuqian Fu
V^{2}-SAM: Marrying SAM2 with Multi-Prompt Experts for Cross-View Object Correspondence
In: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026)
Paper
Yuheng Zhang, Mengfei Duan, Kunyu Peng, Yuhang Wang, Ruiping Liu, Fei Teng, Kai Luo, Zhiyong Li, Kailun Yang
ProOOD: Prototype-Guided Out-of-Distribution 3D Occupancy Prediction
In: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026)
Paper
Xiaolong Qian, Qi Jiang, Yao Gao, Lei Sun, Zhonghua Yi, Kailun Yang, Luc Van Gool, Kaiwei Wang
Towards Universal Computational Aberration Correction in Photographic Cameras: A Comprehensive Benchmark Analysis
In: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026)
Paper
Mario Markov, Stefan Maria Ailuro, Luc Van Gool, Konrad Schindler, Danda Pani Paudel
FireScope: Wildfire Risk Raster Prediction With a Chain-of-Thought Oracle
In: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026)
Paper
Yue Li, Qi Ma, Runyi Yang, Mengjiao Ma, Bin Ren, Nikola Popovic, Nicu Sebe, Theo Gevers, Luc Van Gool, Danda Pani Paudel, Martin R. Oswald
Chorus: Multi-Teacher Pretraining for Holistic 3D Gaussian Scene Encoding
In: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026)
Wednesday, May 20, 2026
Twenkid - The Child of AGI - is Challenging the Grandfather of AI Yann Lecun
Yann Lecun cites a post, which is acknowledging that his ideas were correct etc.
A part of the concluding punch lines:
* It is "dime a dozen", but people decades older than me who *literally* repeated and ripped-off my suggestions, [strategy, plans, principles, theories, directions, conclusions, thoughts ...] and observations decades later, got prized with billions to *waste* and I am not even mentioned. They did it even in my own country, where one Bulgarian-Canadian became an "architect" of an institute in Sofia, with statements which were a *20 years late rip-off* of the above-cited essay, which were sold as "innovative" and ground-breaking :))), "for the first time in Eastern Europe" etc.
* As of "Dime a dozen"--> yes, or even "Five a dozen" -->
The current "supercomputer" of my lab is called "PETAK I", where "Pet" means "5": from: 1. "Pentium" (historically the CPU and brand on which TUM was created), 2. The CPUs of all nodes: Core i5 (all old ones, 11-14 years old models, LOL); 3. Five nodes of the cluster (the initial full configuration) 4. A parody CPU-name from a science fiction work from 2004 from that theory ("Pentium 5"; "Петият Петак") and 5. In Bulgarian it also means "5 cents"... LMAO
"""Tunisia.AI
Yann LeCun may have been right about something important: next-token and next-pixel prediction are probably not the most efficient path to real world understanding.For years, the industry has been scaling generative models under the assumption that bigger models, more data, and more compute would eventually produce deeper intelligence. LeCun has been arguing the opposite: predicting every word or every pixel forces models to spend huge amounts of compute on surface details instead of learning the underlying structure of reality.That’s the core idea behind JEPA (Joint-Embedding Predictive Architecture): instead of reconstructing the world pixel by pixel, learn a compact latent representation and predict what happens next inside that space.The problem is that these models have historically been unstable. They suffer from “representation collapse,” where the latent space becomes too simple to carry useful information unless you add complex training tricks, auxiliary losses, or frozen components.A new paper, LeWorldModel (LeWM), shows a much cleaner approach. It trains end-to-end from raw pixels using only two losses: a next-embedding prediction loss and a Gaussian regularizer on the latent space. This drastically simplifies the training setup compared to prior approaches.The efficiency gains are striking. The model has around 15 million parameters, trains on a single GPU in a few hours, and can plan up to 48× faster than larger foundation-model-based world models, while staying competitive on several 2D and 3D control tasks. Its latent space also appears to capture meaningful physical structure and can detect physically implausible events in controlled environments.This doesn’t mean generative AI is a dead end. LLMs remain extremely powerful. But it does reinforce a key technical point: for world modeling and physical reasoning, predictive latent-space approaches may be far more compute-efficient than brute-force generation.The real shift might be this: not models that generate everything, but models that understand enough of the world to predict what actually matters."""https://www.facebook.com/yann.lecun/posts/pfbid0oEkmbuzwdvNC6JWRDoRzaDAyLNtzPZzvvman5ob89Z1v5AYvuQdTrQEjFcGJ3958l?__cft__[0]=AZaoNFuahXF3_DoSg2wL8ID4WPPsHBRWvjU51k3Sa742m0aQX2r7nl0VWyDXlyXx5eBBbxc-OWgq18zIcllnPjdIm_HNuV1Mga9hvE-Nga9s6_vMV0SjQVFusWFQhxpNCbrhQX0HjKTjh5WwzRfqeSFqWjE6X0vy_kidQqxMcHABvF2r6roXC6OCa1EcctDCHZNSND1P9hmQT8Cl2AcxeawX&__tn__=%2CO%2CP-R
Todor Arnaudov
| From the books "The First modern AI Strategy ..."... and "Stack Theory is Yet another fork of Theory of Universe and Mind" published at SIGI-2025 |
This idea, together with the prediction and next-token prediction (but in multi-scale, multi-precision hierarchy of resolutions of causality-control and perception), was published and explained nearly 25 years ago in Theory of Universe and Mind and presented during the world's first university courses in AGI in 2010 and 2011. Y.Bengio also rediscovered it 2017-2018 (Consciousness prior) and his example is almost literary repetition of an introductory definition from a treatise published about 14 years earlier. The author was a teenager, LOL.
Yann LeCun:
@Todor Arnaudov as I pointed out on another platform, ideas are a dime a dozen. The hard part, for something like this, is to implement it and to make it work.
The whole idea of hierarchical representations and learning by prediction is very old.
But learning hierarchies of representations didn't really work until convolutional nets were shown to do it in the late 1980s and more forcefully in the early 2010s (this took a while).
===
Todor Arnaudov:
Hi, first thanks for your answer as I didn't expect this honor. I don't disagree that there were earlier "prophets", I recently published a hyperbook with a related name (nearly 5000 pages in total), where one of the intros in one of the sections with collectons of related, prior and later work is a citation from the Holy Bible:
"There is nothing new under the Sun"
Some of the prior work doesn't get enough credit and is unknown, even the "fellow AI historian" Schmidhuber doesn't mention them, e.g. the Soviet lab of Bongard and his colleagues etc. (E.g. once I caught Chollet literary restating insights from the 1967 book "Проблема узнавания" - perhaps he didn't know; he also rediscovers definitions for general intelligence of mine, published in 2001 (he couldn't know about it) - see the link at the end and the reviews of the LLMs).
The Bible is called "The Prophets of the Thinking Machines: Artificial General Intelligence & Transhumanism: History, Theory and Pioneers; Past, Present and Future", SIGI-2025 - and yes, almost nobody will bother to even open it. :))
BTW, e.g. IMO your PhD student Marc’Aurelio Ranzato deserves more credit for his pioneering work in DL and his insights (which perhaps [are] ~ also yours) -- his work is credited in my historical collections here: https://twenkid.com/agi/Lazar_The_Prophets_of_the_Thinking_Machines_20-8-2025.pdf ~p.21.
I do agree that I had to push to implementations immediately (not your type of NNs though) and perhaps my claims would be accepted after I implement them all by myself (Or if I or somebody else had - 20 years ago with no collaborators or any funding, no mechanical Turks to labe a gazillion of data and computing iterations, compared to 20 years later and all the collected resources in all senses of the word: i.e. IMO the difficulty of the implementation is supposed to decrease and be "discounted" with time like in RL; an idea 25 or 50 years ago may end up more "valuable" than an implementation in the present - see generative AI and the final citation below)
* I know about your dismissive opinion about "ideas", e.g. your comments to Schmidhuber's recent challenge, that you also could find ideas in your unpublished notes or something etc. and I've listened to your answers to him since 2022, "The path towards autonomous AI..." - I remember you defended yourself with referring to Optimal Control etc.
However many works are proposals, theoretical etc. but still get recognized, while other prior ones - don't and are even "humiliated". Also the core novelty there in my reading of the paper was also matching the mentioned TUM (and too general, it was not an implementation too); in general it looked like another cognitive architecture, which were popular in the cognitive science and the AGI community decades earlier, perhaps I have to reread it.
* I understand that if you dismiss even the German, who is at a comparable status as yours or, say he has more ground to be believed that he is, then you (and almost anyone) wouldn't recognize the claimed "priority" or even just the "contribution" of some obscure "self-proclaimed" "crank" or the mentioned theory, no matter the evidence (maybe you wouldn't even bother to check any evidence or count it as "theory" or anything).
BTW, your recent work about the brain/humans as "not general ..." also matches and is closely related to my prior work/accounts, beginning in early 2000s, however with different interpretation of the observations. The limitations don't deny the concept of general intelligence and the possibility of general principles and modules (prediction-compression etc.) I may address the correspondences in a paper.
* Stack Theory is yet another Fork of Theory of Universe and Mind, SIGI-2025
* The first modern AI strategy was published by an 18-year old in 2003 and repeated and implemented by the whole world 15-20 years later: Bulgarian Prophecies: How would I invest one million for the greatest benefit for the development of my country? https://twenkid.com/agi/Purvata_Strategiya_UIR_AGI_2003_Arnaudov_SIGI-2025_31-3-2025.pdf (Bongard, 1967 vs Chollet,2024 p.169-170)
* BTW, cheers from Kyuchuk Paris - that's the district in the city of Plovdiv, where TUM was created. 🙂
* This is the world's first modern "AI strategy", 2003, repeated and implemented by "the whole world" 15-20 years later: https://twenkid.com/agi/proekt.htm
* It is "dime a dozen", but people decades older than me who *literaly* repeated and ripped-off my suggestions and observations decades later, got prized with billions to *waste* and I am not even mentioned. They did it even in my own country, where one Bulgarian-Canadian became an "architect" of an institute in Sofia, with statements which were a *20 years late rip-off* of the above-cited essay, which were sold as "innovative" and ground-breaking :))), "for the first time in Eastern Europe" etc.
* As of "Dime a dozen"--> yes, or even "Five a dozen" -->
The current "supercomputer" of my lab is called "PETAK I", where "Pet" means "5": from: 1. "Pentium" (historically the CPU and brand on which TUM was created), 2. The CPUs of all nodes: Core i5 (all old ones, 11-14 years old models, LOL); 3. Five nodes of the cluster (the initial full configuration) 4. A parody CPU-name from a science fiction work from 2004 from that theory ("Pentium 5") and 5. In Bulgarian it also means "5 cents"... LMAO
Also as I predicted in 2013 (counterintuitive to all "experts" up to just a few years ago, I namely wrote this article *because* of clueless "experts" predicted the opposite; they were later cited thousands of times for their *WRONG* world-model and wrong predictions):
"Creative Intelligence will be First Surpassed and Blown Away by the Thinking Machines, not the "low-skill" workers whose jobs require agile and quick physical motion and interactions with human-sized and human-shaped environment"
https://artificial-mind.blogspot.com/2013/10/creative-intelligence-will-be-first.html
" (...) For the intellectual jobs - it's much easier to pick a computer, run the appropriate software or connect it to the service,
and get it thinking - you already have decent cameras, microphones and many sensors even in smartphones. (...) The bottom line is that the "white collars" are more endangered in current-time economy. Perhaps that kind of economy could hardly survive the AGI revolution. I guess it may turn upside down for a while - the low-skill workers could get higher pay, because intellectual activities will be done in 1 ms for free... 😉 We, the smart guys (the smart asses, see "Super Smartasses" the graphical series ) wouldn't be needed by anyone... Not that we are needed now. :))"
* The prediction of the generative AI (however it could have been created by the late 2000s-early 2010s - it came *too late*, not too quick as Hinton and Bengio "complain"; not with gradient-descent of course):
-- Creativity is Imitation at the Level of Algorithms - An outline sketch of a possible path of development of the Artificial Intelligence "Emil"
* Petak I: https://github.com/Twenkid/SIGI-2025/blob/main/petaki.md
Sunday, May 17, 2026
List of the Biggest Early GPT LLMs for Non-English Languages Circa 2020-2021-2022 - Update with Chinese, Russian and Spanish models
Updates to the table about the biggest GPT-models circa 2021 in the book "The First Modern Strategy ... "
The GPT2-MEDIUM-BG seems to be among the biggest 6-7 models, trained on a free single Tesla T4 in Colab. :))
p.26
В
този труд са дописани допълнителни бележки към цитирани откъси от класическия
ТРИВ от [17]
за
мерките за зародиш на разум и степените на развитие. От[TT1] големите езикови модели –
сведения за тях и работата им и някои най-нови публикации, както и сравнение на
данни за ранни GPT-модели на различни езици – арабски, френски и множество европейски,
японски и китайски – като българският GPT2-MEDIUM-BG се оказва един от шест-седем най-големи модели
от такъв тип в света до 2021 г. за езици различни от английския – по-големи
около същото време или малко по-рано са само за китайски, арабски, руски, румънски
и френски; с подобен размер е за японски[1],
разработен по същото време като българския. Кратки бележки за по-големия проект за
инфраструктура за Общ ИИ и всякакви проекти, свързани и с пораждащи модели – „Вседържец[TT2] “.(...)
[1] Възможно е и др.: на 18.5.2026 добавих китайски, руски и нов испански от 2022 г. Виж [236]
p.238 [236]
236. Ранни
пораждащи големи езикови модели от типа GPT за езици, различни от английския: български, френски, арабски, испански, португалски, немски, китайски; гръцки, сръбски, румънски, японски, китайски,
руски – 2020-2021 г. Датата на някои –
по дати на файловете с теглата на модела, дата на научна статия и пр. До края
на 2021 г. само китайският, френският, арабският, руският,
румънският, японският и българският са с над 100-тина милиона параметъра.
Румънският е силен, обучаван на 17 GB-ов корпус. Само българският
вероятно е разработен от един-единствен човек с бюджет и подкрепа = 0 и авторът
представя родната компютърна лингвистика в тази дисциплина като самозван
„хайдутин“, понеже институциите и по-„елитните“ бойци чакаха до 2023-2024
г. [66]. Сравни с аналогичен случай с ДЗБЕ
около 2001-2003 г. и бездействието на ИБЕ на БАН и на останалите филолози от
университетите спрямо явленията, срещу които ДЗБЕ се противопоставяше и се
опитваше да „призове“ „чети“ [16][40], а „маститите“
езиковеди (по определението на Павлин Стойчев, „PC World Bulgaria“, 5.2003 [239]) гледаха безучастно и обясняваха, че това
били „естествени процеси“. Сравни с бележките за „Добродетелната
дружина и нехранимайковците“ и [40], 2003 г., дали талантите не са имали избор да не
учат в „най-престижните университети“ и да развият местните и пр. XLM-R от „Фейсбук“, 11.2019 е по-голям, но в него
българският е един от 100 езика, на които е обучаван, и моделът е за
класификация и отговаряне на въпроси, а не за пораждане.
Таблици: подредени по време на създаване и по
размер:
Допълнена на 18.5.2026 с китайския,
руския и испанския голям модел.
|
Ранни големи
езикови модели “GPT“за разни езици по време |
||
|
GPT |
117 M |
6.2018 |
|
GPT2 |
1.554 B |
14.2.2019 (XL)
(публик. 11.2019)
|
|
Италиански |
117 М |
4.2020 |
|
Португалски |
124 M? |
5.2020 |
|
Гръцки |
124 M? |
9.2020 |
|
Немски |
124 M? |
11.2020 – 8.2021 |
|
Китайски |
124 M? |
11.2020 – 5.2021 |
|
Испански |
124 M? |
12.2020 |
|
Китайски |
2.6 B |
12.2020 |
|
Арабски |
1.46 B |
3.2021 |
|
Руски |
760 М |
5.2021 |
|
Френски |
1 B |
5.2021 |
|
Румънски |
774 M |
7.2021 |
|
Сръбски |
124 M? |
7.2021 |
|
Български |
355 М |
6.2021 – 8.2021, Тош |
|
Японски |
336 М |
16.8.2021 |
|
Японски |
1 B |
20.1.2022 |
|
Испански |
774 M |
1.4.2022 |
|
БАН |
124 M |
27.6.2023 |
|
INSAIT |
7.3 B |
2.2024 |
|
Ранни големи езикови
модели тип “GPT“, |
||
|
Китайски |
2.6 B |
12.2020 |
|
Арабски |
1.46 B |
3.2021 |
|
Френски |
1 B |
5.2021 |
|
Руски |
760 М |
5.2021 |
|
Румънски |
774 M |
7.2021 |
|
Български |
355 М |
6.2021 – 8.2021, Тош |
|
Японски |
336 М |
16.8.2021 |
|
Японски |
1 B |
20.1.2022 |
|
Испански |
124 M? |
12.2020 |
|
Португалски |
124 M? |
5.2020 |
|
Немски |
124 M? |
11.2020 – 8.2021 |
|
Италиански |
117 М |
4.2020 |
|
Китайски |
124 M? |
11.2020 – 5.2021 |
|
Гръцки |
124 M? |
9.2020 |
|
Сръбски |
124 M? |
7.2021 |
|
БАН |
124 M |
27.6.2023 |
|
INSAIT |
7.3 B |
2.2024 |
|
GPT |
117 M |
6.2018 |
|
GPT2 |
1.554 B |
14.2.2019 (XL) (публик.
11.2019) |
1. Тодор Арнаудов,
GPT2-MEDIUM-BG, Свещеният
сметач, ДЗБЕ ~6.2021 – 8.2021, 345М – български – обучен от нулата на Tesla T4 в Colab [31][46]
[31.
Т.Арнаудов, Подготовка на набор данни и обучение на GPT2-MEDIUM на български, 6.2021 г.: Train
GPT2-MEDIUM Google Colab Tips & Tricks Any Language From Scratch
https://github.com/Twenkid/GPT2-Bulgarian-Training-Tips-and-Tools
https://github.com/Twenkid/GPT2-Bulgarian-Training-Tips-and-Tools/blob/main/bggpt_sacred_computer.ipynb
*
T.Arnaudov, GPT2
Unlimited-Length Generation with Hidden Prompt Injections - Code Review,
2021 (1.2023): https://youtu.be/V1eO2OpsXBE
* T.Arnaudov, GPT2-Medium Training from Scratch on Colab for Any Language -
Tips & Tricks by Twenkid, 2021: https://youtu.be/F-Xt-cK4L-g – Код и подробна инструкция
за подготовка на набор от данни и обучение на GPT2 модели безплатно в Google
Colaboratory от
нулата (на английски). Популярен клип по темата с над 4 хил. гледания, 68
харесвания, 30-тина абонати.]
46. T.Arnaudov, 2021 (2025), gpt2-medium-bg, https://huggingface.co/twenkid/gpt2-medium-bg
* T.Arnaudov, https://github.com/Twenkid/GPT2-Bulgarian-Training-Tips-and-Tools
* T.Arnaudov, Train GPT2-Medium in Google Colab – Tips & Tricks – Any
language from scratch, 2021 https://youtu.be/F-Xt-cK4L-g
* T.Arnaudov, Unlimited length with GPT2 … Update 6-5-2023 …
* T.Arnaudov, Hidden Prompt Injections: Unlimited Length GPT2 Generation, 1.
2023 (work from 2021) https://www.youtube.com/watch?v=V1eO2OpsXBE
]
2. Antoine Simoulin, Benoit Crabbé. Un modèle Transformer Génératif
Pré-entrainé pour le ______ français. Traitement Automatique des Langues
Naturelles, 6.2021, Lille, France. pp.246-255. ffhal03265900f https://hal.science/hal-03265900 : – френски GPTfr-124M и
GPTfr-1B с архитектурата на GPT3. 5.2021
3. https://huggingface.co/dbddv01/gpt2-french-small - друг френски малък SMALL 137M, също
обучен в Colab като
българския, но с платена услуга Colab Pro.
4. Wissam Antoun and Fady Baly and Hazem HajjARAGPT2:
Pre-Trained Transformer for Arabic Language Generation, 7.3.2021
– https://arxiv.org/pdf/2012.15520
Арабски, 4 варианта: 135M, 370M, 792M, 1.46B
5. https://huggingface.co/datificate/gpt2-small-spanish испански: SMALL 124M? 12.2020 (дообучен от английския, използва
техники от португалския)
6. https://huggingface.co/pierreguillou/gpt2-small-portuguese/tree/main - португалски, SMALL, 124M?,
5.2020
7. https://github.com/stefan-it/german-gpt2 немски малък: SMALL : 11.2020 - 8.2021 (тората версия - преобучен с
по-добри резултати, използващ dbmdz)
8. https://huggingface.co/dbmdz/german-gpt2/tree/main - 8.2021 - 10.2021 SMALL 124M?
9. GePpeTto Carves
Italian into a Language Model, Lorenzo De Mattei, Michele Cafagna, Felice
Dell'Orletta, Malvina Nissim, Marco Guerini, 29.4.2020 – италиански SMALL 117M
10. Chinese GPT2
SMALL-like models: https://huggingface.co/uer/gpt2-chinese-cluecorpussmall SMALL 11.‘20 – 5.‘21 https://huggingface.co/ckiplab/gpt2-base-chinese
11. https://huggingface.co/nikokons/gpt2-greek - гръцки, малък, SMALL, 9.2020
12. https://huggingface.co/macedonizer/sr-gpt2 - сръбски, малък, SMALL, 25.7.2021
13. https://huggingface.co/readerbench/RoGPT2-medium румънски, 124M, 354M, 774M LARGE, 7.2021 https://huggingface.co/readerbench/RoGPT2-large/tree/main Много голям за тогава корпус: 17 GB и обстойни тестове за производителността в статията:
RoGPT2: Romanian GPT2 for Text Generation, M.Niculescu, S.Ruseti, M.Dascalu, 11.2021,
University Politehnica of Bucharest, 2021 IEEE
33rd International Conference on Tools with Artificial Intelligence (ICTAI)
https://www.researchgate.net/publication/357227566_RoGPT2_Romanian_GPT2_for_Text_Generation
14. Японски модел от типа на GPT2-MEDIUM с 336 М, 24
слоя, 1024-размерни вектори. https://huggingface.co/rinna/japanese-gpt2-medium * https://github.com/rinnakk/japanese-pretrained-models
Японският е много добре обучен (development
perplexity, ppl 18, обучен за
45 дни на 8xV100 32 GB върху
японската Уикипедия и др.), файл от 16.8.2021 г. https://huggingface.co/rinna/japanese-gpt-1b - 1-милиарден модел, 20.1.2022 г., 24 слоя,
2048-размерен вектор.
15. Unsupervised Cross-lingual Representation Learning at Scale, Alexis
Conneau, Kartikay Khandelwal, …, Veselin Stoyanov, 11.2019/4.2020 - XLM-R многоезичен
езиков модел, обучаван и върху български корпус. В разработката участва Веселин
Стоянов.
[Updates in the edition from 17.5.2026: Chinese, Russian, Spanish]
16. CPM: A Large-scale Generative
Chinese Pre-trained Language Model, Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke,
Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi,
Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li,
Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan
Zhu, Maosong Sun, 1.12.2020, https://arxiv.org/abs/2012.00413
17. Methods for Detoxification of Texts for the Russian Language Daryna
Dementieva‡ , Daniil Moskovskiy‡ , Varvara Logacheva‡ , David Dale‡ , Olga
Kozlova† , Nikita Semenov† , and Alexander Panchenko‡ ‡Skolkovo Institute of
Science and Technology, Moscow, Russia †Mobile TeleSystems (MTS), Moscow,
Russia {daryna.dementieva, daniil.moskovskiy, v.logacheva, d.dale,
a.panchenko}@skoltech.ru {oskozlo9,nikita.semenov}@mts.ru, 19.5.2021 https://arxiv.org/pdf/2105.09052 https://github.com/ai-forever/ru-gpts
18. Spanish Language Models, Asier Gutiérrez-Fandiño, Jordi
Armengol-Estapé, Marc Pàmies, Joan Llop-Palao, Joaquín Silveira-Ocampo,
Casimiro Pio Carrino, Aitor Gonzalez-Agirre, Carme Armentano-Oller, Carlos
Rodriguez-Penagos, Marta Villegas; 15.7.2021 – 5.4.2022 (v1 to v5); the GPT
models appears in v3 from 1.4.2022
https://arxiv.org/abs/2107.07253v3
* Размерите са приблизителни и може да са неточни за модели, които не са ползвали точно архитектурата, някои са с различен брой токени (32000) и пр. Влияние оказва не само броят параметри, а още качеството на данните и начинът на обучение и др. На онзи етап и мащаби всички модели са експериментални и с научна и образователна цел.
* [Submitted on 1 Dec 2020]
CPM: A Large-scale Generative Chinese Pre-trained Language Model
https://arxiv.org/abs/2012.00413
during training: batch size = 3,072; 3M tokens .. (vs 1 M for GPT3 training)
strong few shot learning ...
* Methods for Detoxification of Texts for the Russian Language
Daryna Dementieva‡
, Daniil Moskovskiy‡
, Varvara Logacheva‡
, David Dale‡
,
Olga Kozlova†
, Nikita Semenov†
, and Alexander Panchenko‡
‡Skolkovo Institute of Science and Technology, Moscow, Russia
†Mobile TeleSystems (MTS), Moscow, Russia
{daryna.dementieva, daniil.moskovskiy, v.logacheva, d.dale, a.panchenko}@skoltech.ru
{oskozlo9,nikita.semenov}@mts.ru, 19.5.2021
https://arxiv.org/pdf/2105.09052
https://github.com/ai-forever/ru-gpts
* Others: later than GPT2-MEDIUM-BG (2021)
Spanish - MarIA GPT-2 -
https://arxiv.org/pdf/2107.07253v1 - only BERT-like model in July 2021
GPT2 models, up to Large 774M appears in v3, 1.4.2022:
https://arxiv.org/abs/2107.07253v3
...
German - https://www.kkirchheim.de/blog/german-gpt/ - "Training a German LLM from scratch"
Existing German models available on Hugging Face have 137M parameters and a context length of 1024 tokens1, which is quite limited compared to recently released models, such as those in the LLAMA family.
