AI Is Not Going to Be Where It Is Now

Tathagata
2 min readJust now

--

People is taking it lightly what the world is observing right now. In all these fuss of ๐—ก๐˜ƒ๐—ถ๐—ฑ๐—ถ๐—ฎ, ๐— ๐—ฒ๐˜๐—ฎ and ๐—ข๐—ฝ๐—ฒ๐—ป๐—”๐—œ, the ๐—ฎ๐—ฟ๐˜๐—ถ๐—ณ๐—ถ๐—ฐ๐—ถ๐—ฎ๐—น ๐—ถ๐—ป๐˜๐—ฒ๐—น๐—น๐—ถ๐—ด๐—ฒ๐—ป๐—ฐ๐—ฒ universe is taking a new direction, what I have been ๐˜ต๐˜ฆ๐˜ญ๐˜ญ๐˜ช๐˜ฏ๐˜จ ๐˜ง๐˜ฐ๐˜ณ ๐˜ข ๐˜บ๐˜ฆ๐˜ข๐˜ณ ๐˜ฏ๐˜ฐ๐˜ธ.

๐—ก๐˜ƒ๐—ถ๐—ฑ๐—ถ๐—ฎ just released RTX 5090 which is a giant beast. However this yearโ€™s least performance chip (RTX5070) was as powerful as last yearโ€™s best chip (RTX4090) and at ๐Ÿญ/๐Ÿฏ๐—ฟ๐—ฑ ๐—ผ๐—ณ ๐—ถ๐˜๐˜€ ๐—ฝ๐—ฟ๐—ถ๐—ฐ๐—ฒ!! ๐—ง๐—ต๐—ถ๐˜€ ๐—ถ๐˜€ ๐—ฐ๐—ฟ๐—ฎ๐˜‡๐˜†! Raw computing power and resources are getting easily available now compared to what it was 3 years back and itโ€™s not gonna stop anytime soon.

So, using better computing power to get ๐—ต๐—ถ๐—ด๐—ต๐—ฒ๐—ฟ ๐—ฝ๐—ฒ๐—ฟ๐—ณ๐—ผ๐—ฟ๐—บ๐—ถ๐—ป๐—ด ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น is not a novelty now. Anyone can add up a more complex model, get more parameters and train harder to build a greater model! But it also increases the cost. This is the reason why we have to shift our focus from general models to ๐˜๐—ฎ๐˜€๐—ธ ๐˜€๐—ฝ๐—ฒ๐—ฐ๐—ถ๐—ณ๐—ถ๐—ฐ ๐—น๐—ผ๐˜„-๐—ฐ๐—ผ๐˜€๐˜ ๐—Ÿ๐— ๐˜€. Except for conversational AI we donโ€™t need that much flexibility. Itโ€™s time to distribute work among SLMs. Instead of ๐Ÿญ๐Ÿฌ๐Ÿฌ๐˜€ ๐—ผ๐—ณ ๐—ฏ๐—ถ๐—น๐—น๐—ถ๐—ผ๐—ป๐˜€ letโ€™s get back to ๐Ÿณ-๐Ÿด๐Ÿฌ๐Ÿฌ๐—  ๐—ฝ๐—ฎ๐—ฟ๐—ฎ๐—บ๐—ฒ๐˜๐—ฒ๐—ฟ๐˜€ or maybe ๐—ฎ ๐—ณ๐—ฒ๐˜„ ๐—ฏ๐—ถ๐—น๐—น๐—ถ๐—ผ๐—ป๐˜€. The way itโ€™s moving, expensive and powerful GPUs will keep coming but that doesnโ€™t count in AI boom. True AI needs to come back with feature engineering, correlation and so on. A lot of recent papers show achieving near ๐—Ÿ๐—Ÿ๐—  ๐—ฝ๐—ฒ๐—ฟ๐—ณ๐—ผ๐—ฟ๐—บ๐—ฎ๐—ป๐—ฐ๐—ฒ ๐˜„๐—ถ๐˜๐—ต ๐—ฆ๐—Ÿ๐— ๐˜€.

๐— ๐—ฒ๐˜๐—ฎ just launched its Llama 3.3 model of 70B parameters which achieves near-equal results to Llama 3.1 which is of 405B parameters. This is an almost ๐Ÿด๐Ÿฏ% ๐—ฝ๐—ฎ๐—ฟ๐—ฎ๐—บ๐—ฒ๐˜๐—ฒ๐—ฟ ๐—ฟ๐—ฒ๐—ฑ๐˜‚๐—ฐ๐˜๐—ถ๐—ผ๐—ป to achieve the similar scale result! Just a building block towards what the future holds. The world is not anymore at the spot where compute used to be the limiting factor. Hence, focusing on ๐—ณ๐˜‚๐—ป๐—ฑ๐—ฎ๐—บ๐—ฒ๐—ป๐˜๐—ฎ๐—น๐˜€ ๐—ผ๐—ณ ๐—”๐—œ, solving problems with research and engineering skills is much more important again.

Sources:

https://www.datacamp.com/blog/llama-3-3-70b

--

--

Tathagata
Tathagata

Written by Tathagata

Write about Machine Learning and Artificial Intelligence, especially about NLP. Love books, sport, travel, entrepreneurships. IIT Bombay CSE

No responses yet