• Return on Security
  • Posts
  • Leveraging Large Language Models (LLMs) in Business: Protecting Custom LLMs and Proprietary Data in the AI Era

Leveraging Large Language Models (LLMs) in Business: Protecting Custom LLMs and Proprietary Data in the AI Era

Learn about the revolutionary roles of custom LLMs and proprietary data sets in business and the need for stringent security measures to mitigate risks associated with them.

In a world where data is king, the future of every industry hinges on two pivotal assets:

  1. Proprietary business intelligence

  2. The growing power of generative AI

The dawn of this new era is not just imminent; it’s already reshaping the competitive landscape, turning data and custom AI models into modern-day magic.

"Every company and every industry is fundamentally built on their proprietary business intelligence, and in the future, their proprietary generative AI."

Jensen Huang, CEO of NVIDIA, from the Q4 2023 earnings call


Custom Large Language Models (LLMs) and proprietary data sets have become invaluable assets for companies, driving technological advancement. While providing immense benefits, they also present new security challenges, making intellectual property and innovation vulnerable. LLMs are a race to commoditization, and proprietary data will be the enduring competitive edge as the new intellectual property or "crown jewels" of companies. Using Porter's Five Forces framework, this post explores these risks and underlines the importance of strong security measures to safeguard these new resources.

The Game-Changing Impact of Large Language Models and Generative AI

The rise of Large language models (LLMs) and generative artificial intelligence (generative AI) has been an exciting development in technology, with many claiming it is as revolutionary as the invention of the Internet or the Apple iPhone.

These advanced tools are transforming how we interact with language, produce and consume content, automate tasks, and tackle business and technical problems.

LLMs are a type of artificial intelligence (AI) that can generate text, translate languages, write and parse content, help you program, and answer your questions in an informative way. These models are trained on massive datasets of text and code and can learn to perform many kinds of tasks.

With the ability to process vast amounts of data and create sophisticated models of language and communication, LLMs and generative AI are helping businesses, researchers, and individuals achieve their goals more efficiently and effectively than ever before.

Generative AI and LLMs are democratizing access to create leverage and scale.

LLMs and generative AI will play an increasingly important role in shaping the future of technology and society.

However, these advancements are so significant that some are calling for a temporary pause with large AI models because the existential risk of AI is not yet fully understood.

Regardless of your personal feelings about the speed of advancements in AI, every business out there, from manufacturing to software companies to banks, is scrambling to get in on the action. Be it a feature of an existing product, creating a new product offering, or launching a new company solving a problem with AI, everybody wants in.

And the way they’ll get in is through proprietary data. Data gravity will become even stronger and every company will bring their models to the data.

But that's precisely where the new risk is happening.

The Competitive Advantage of Custom LLMs and Proprietary Data

When your feature, product, or whole company is built on a generically trained, universally accessible technology that anyone can use, you no longer have differentiation. You're using what everyone else is, and you can be disrupted overnight.

You no longer have a competitive moat.
(Google and OpenAI don't think they have a moat either)

So companies are rushing to get smarter to out-disrupt the inevitable disruption. Now companies are moving past consumer-grade LLMs and generative AI and acquiring, building, and enriching custom LLMs built from open-source models and proprietary data sets.

Let's break these terms down:

  • Custom LLMs - specifically tailored and trained LLMs to meet your organization's or application's unique needs. While a general LLM like OpenAI's GPT-3 is trained on vast amounts of diverse text data from the Internet, a custom LLM is trained on your specialized or proprietary data.

  • Proprietary Data Sets - unique collections of data that a company owns and controls. This can include customer segmentation, behavior data, product usage data, sales data, and more. This data provides unique contexts specific to the company's operations and can give it a competitive edge. Data drives AI outcomes.

Think of custom LLMs as a suit you get tailored to fit your (or, in this case, your company’s) body’s shape and size, as opposed to “off-the-rack” suits (like GPT-3/4). Off-the-rack suits fit a wide range of body types and sizes, but they may not fit you in the right places and make you look your best. Just like how a great tailor can make you look better in just about any piece of clothing, custom LLMs are fine-tuned with your own data, making sure they are perfectly aligned with your business objectives.

When custom LLMs and your business data are combined, these new models can “understand” your business and generate recommendations to improve operations and sales, better customer support, give you specific ways to reduce your costs, help inform M&A activities, and increase your company's competitive positioning.

In short, companies can get superpowers.

⚠️ Side Note: How companies make custom LLMs and get their proprietary data is a rapidly evolving field I won't cover in this post.

If you want to go down a rabbit hole, check out a few of these resources that I am in no way an expert in:

- LangChain
- Pinecone (raised a $100.0M Series B recently)
- Vector Databases
- AI Cannon from a16z

The New Data Security Threat

As you might imagine, the time, money, and effort spent building and acquiring those new superpowers will be held in the highest regard. The custom LLMs and the proprietary data sets become your company's new crown jewels and new data security threats.

In a world where everyone is using LLMs to get ahead, exposing your custom LLM or proprietary data sets presents a much more holistic business risk to intellectual property and innovation.

The real power of AI comes from combining your company's specific context (proprietary data sets) with LLMs. And since LLMs are always training on new data sets, keeping your company's custom context out of competitors' hands will be the highest company priority.

Not protecting your custom LLMs and proprietary data sets can lead to losing valuable intellectual property, resulting in decreased innovation and reduced profitability.

As Daniel Miessler points out in his excellent post, The AI Attack Surface Map:

"Once those AI-powered products and services start to appear, we're going to have an entirely new species of vulnerability to deal with."

Daniel Miessler

The rest of this article will focus on the risks to LLMs, proprietary data sets, and how Porter's Five Forces framework can help organizations understand and navigate these challenges from a business standpoint.

Business Risk Analysis through Porter's Five Forces Framework

Porter's Five Forces is a framework that Harvard Business School's Michael E. Porter developed. It's a simple tool for analyzing a company's competitive environment.

The Five Forces are:

  1. Competitive Rivalry: How intense the competition is in your industry, based on your competitors and their capabilities.

  2. Supplier Power: How much control suppliers have to increase prices.

  3. Buyer Power: How much control buyers have to drive prices down and the cost of switching.

  4. Threat of Substitution: The ability of your customers to find a different way of doing what you do.

  5. Threat of New Entry: The power of people or businesses to enter your market.

So why is this useful?

By understanding where your company's power lies, businesses can take advantage of a situation of strength, improve a situation of weakness, and avoid taking wrong steps.

Remember that part from above where we talked about your company's newfound superpowers with custom LLMs trained on your proprietary data?

Let's frame this threat in a business context.

Unveiling the Threat Landscape with Porter's Five Forces

Let's analyze the business threats using Porter's Five Forces framework.

Competitive Rivalry: Amplified Competition through LLM-Driven Insights

The widespread use of custom LLMs can intensify competition within an industry, as companies may use these models to gain insights into their competitors' strategies, products, and services, leading to increased price competition and reduced profitability.

Supplier Power: The Shift in Supplier Dynamics Owing to LLM-Enhanced Supply Chain Intelligence

Using custom LLMs in your company's supply chain could change the dynamics with your suppliers. With better insights, suppliers can model out better terms to negotiate, which can have adverse financial impacts on your company.

Buyer Power: The Surge in Buyer Influence through LLM-Generated Information

Custom LLMs can empower buyers with more information and insights, potentially increasing their bargaining power and influencing companies to improve customer relations and offerings. If your custom models are exposed, your buyers can get much smarter, further driving down your revenue margin.

Threat of Substitution: The Spread of Alternatives Due to LLM-Fueled Innovation

As custom LLMs become more prevalent, the likelihood of substitute products or services entering the market increases, posing a significant threat to an organization's market share. Investing in research and development (R&D) and keeping how you operate your business with the newfound superpowers will become the most important IP you have, so you don't get sidestepped completely.

Threat of New Entry: The Lowered Entry Barriers Due to LLM-Enabled Market Insights

Having your company's custom LLMs or proprietary data exposed can lower the entry barrier for new competitors, making it easier for them to enter the market and increase competition.

Embracing the Challenge

The rise of custom LLMs and proprietary data sets as the "new data" brings about holistic business threats to intellectual property and innovation, the likes of which we have not yet seen.

By applying Porter's Five Forces to custom LLMs and proprietary data sets, companies can identify potential threats, build business strategies to mitigate them and take advantage of opportunities to strengthen their competitive position.

If your company is going hard on custom LLMs, this post can help you add context to any new security measures or procedures you need to implement. This ensures they are better prepared to defend their intellectual property and innovation against threat actors and disruptions in the face of unprecedented threats.

If you like these kinds of posts or you have comments, I'd love to hear your feedback! Also, please consider subscribing to my free weekly newsletter to stay up-to-date on all the latest cybersecurity funding and industry news every week.


or to participate.