The Disruptive Revolution of AI: AI Engineering and the Impact of Foundation Models on Businesses
In the 2010s, the success of AI models was based on supervised learning, which relies heavily on labeling data. For instance, identifying thousands of car images as cars and then recognizing them as such. AlexNet (Krizhevsky et al., 2012), one of the models that initiated the deep learning revolution, was trained using supervised learning. It learned to classify over a million images in the ImageNet dataset into one of 1,000 categories such as “car,” “balloon,” or “monkey.” However, a significant drawback of supervised learning is the cost and time involved in data labeling. Labeling one million images could cost around $50,000, and this cost escalates exponentially with more categories. If you want to ensure accuracy with a second pair of eyes, each image needs to be labeled by two different people, doubling the cost. Since the world contains far more than 1,000 objects, expanding models to work with more objects requires adding more category labels. Scaling to 1 million categories means just the labeling cost would rise to $50 million.
The question of how large a language model needs to be has evolved over time. When OpenAI’s first GPT model was released in June 2018, it had 117 million parameters and was considered large. However, by February 2019, OpenAI introduced GPT-2 with 1.5 billion parameters, making 117 million parameters seem small. Today, a model with 100 billion parameters is considered large, and these sizes will likely continue to grow. What is seen as large today may be considered small or even outdated tomorrow, as Sam Altman suggests.
While language models have impressive capabilities, they are limited to text. As humans, we perceive the world not just through language but also through sight, hearing, touch, and more. Therefore, it is crucial for language models to process data beyond text. GPT-4V, for instance, can understand both images and text, while Gemini can comprehend videos, images, and text. Some models can even understand more complex data like 3D objects and protein structures.
This is where foundation models come into play. Foundation models signify a significant departure from traditional AI research structures. Historically, AI research was divided by different data modeling. NLP (natural language processing) dealt only with text, while computer vision dealt only with images. Text-based models were used for tasks like translation and spam detection, image-based models for object detection and image classification, and audio-based models for speech-to-text (STT) and text-to-speech (TTS) tasks.
Without foundation models, you need to train different models from scratch for specific tasks. Creating your models provides more control, but smaller models might not perform as well as large ones. Training large models from scratch requires significantly more time and data than adapting an existing powerful model. Overall, foundation models make AI application development cheaper and reduce the time to market.
You might think creating your model is cheaper than paying model providers. However, if model providers cut their prices in half three months later, developing your model might become the more expensive option. For instance, at Turk.net, the costs for our AI-based chat support halved when LLM services (model as a service) reduced their prices. I believe that these cost reductions will continue, much like the pricing trends in the cloud world.
You can train large models with small datasets, but this would waste computational resources. With your dataset, you can produce smaller models that deliver similar or better results.
AI engineering might seem intimidating and distant. However, today’s LLMs are offered as services and can be integrated quickly. Understanding what AI engineering entails is essential. AI engineering often refers to the process of developing applications on top of foundation models within companies. Traditional ML engineering typically starts with developing an ML model, while AI engineering begins with existing ML models. The catalyst of AI engineering: takes powerful foundation models and develops the desired application using them. This creates ideal conditions for the rapid growth of AI engineering.
On the other hand, security is a significant concern. The ability of LLMs to improve themselves with data from users introduces significant uncertainty and concern for potential leaks. Additionally, you might find yourself as part of a dataset due to changes in data policies by companies you provide data to. For instance, Zoom faced backlash in August 2023 after silently altering its terms of service to allow the use of all user data, including confidential meetings, for training AI models. Research suggests that AI models can memorize training examples, which could be exposed to users either intentionally or accidentally. For example, HuggingFace’s StarCoder model remembers 8% of its training set, meaning private data could potentially be leaked by malicious actors or users.
There is talk that many startups will fail when OpenAI releases a new feature. This could happen on a significant scale because, for example, you no longer need to buy a PDF generator to create a PDF; you can simply prompt “create a PDF.” This will strengthen the formation of a centralized structure and have substantial effects on many companies.
The release of uncensored language models is also alarming. For instance, Twitter (X)’s AI model Grok, recently introduced, operates without censorship. It answers even malicious questions from users. Asking “how to hack” today results in step-by-step instructions. Without censorship or ethics, what you can ask is limited only by your imagination. In a period when LLMs are newly integrated into our lives, the rapid availability of an uncensored LLM highlights one of the more frightening aspects of this technology. The owner of X, Elon Musk, has already compromised on this issue to differentiate Grok from other LLMs and promote its adoption, which is concerning for the future.
In summary, AI engineering generally requires knowledge of various ML algorithms and neural network architectures. However, with the availability of foundation models, ML knowledge is no longer necessary to develop AI applications. On the other hand, using ML can produce more targeted models that better meet your needs and deliver more predictable results. By employing these models in suitable areas, you can create a valuable differentiator that sets you apart from other companies. It is essential not to shy away from enhancing this capability.
Strangely enough, an important requirement for developing an AI-based product today is front-end knowledge. Sought-after skills for AI engineers include front-end development, prompt engineering, and attention to detail.
I have tried to express the transformation of AI from my perspective. Given the evolving technology and changing needs, AI engineering and the use of foundation models present new opportunities and challenges for businesses. In this process, issues such as data privacy and security must also be considered. Furthermore, the potential dangers and ethical responsibilities of uncensored language models should be considered, and security teams should conduct various checks on these models for their businesses.
To summarize this article in a single sentence, AI usage is not complex and intimidating but rather a subject that is easy and quickly applicable.
At Turk.net, we have integrated AI in many areas using both our models and “model as a service” structures, and we continue to do so. I will try to detail these in a separate article.
Feel free to comment and reach out to me. You can find my LinkedIn and Twitter accounts through these links.
References
In this article, I included excerpts from the recent book on AI Engineering that I read.
This article has been translated into English by ChatGPT.