← All Projects

Conclusions

Conclusions

Based on our experimental validation and engineering analysis, there are several important implications for securing the supply chain of Large Language Models. First, the AI research community must recognize that vulnerability to data poisoning doesn’t depend on model size. Our confirmation of the constant-threshold hypothesis shows that increasing model size doesn’t provide automatic protection; as few as 250 poisoned samples can compromise models regardless of the larger clean dataset size. This requires a shift away from the assumption that “bigger is safer” and towards rigorous, standardized data tracking. Establishing clear auditing frameworks for open-source datasets, similar to the filtering seen in the Common Pile versus The Pile, provides a necessary starting point for this type of ecosystem health.

For specialized AI applications, organizations using “Fine-Tuning as a Service” must implement stricter validation protocols that go beyond standard performance tests. As shown in our mixed-training analysis, there is a significant “stealth” aspect to poisoning, where models can maintain high performance on legitimate tasks while hiding catastrophic “Trojan Horse” backdoors. These vulnerabilities are especially serious in fields like law, where high-entropy triggers such as case citations can hide fake outputs from automated detection systems. We recommend that organizations deploying specialized models conduct domain-expert red teaming and anomaly detection focused on specific activation patterns, rather than relying only on general language fluency metrics.

Understanding the threat landscape requires recognizing the lowering technical barriers for adversaries. The successful use of optimization tools like Unsloth and generative injection strategies using Gemini proves that effective attacks no longer require nation-state level resources. The imbalance in the threat landscape now favors the attacker, as the computing cost to inject poison is minimal compared to the cost of retraining or cleaning a compromised model. Security strategies must therefore evolve to include strong defenses against “multi-vector” attacks that use style transfer to bypass traditional deduplication filters.

When considering the future of open-source AI, we should view data poisoning not as a theoretical edge case, but as a practical and immediate risk to production systems. The democratization of AI technology has created a vast, interconnected ecosystem that is only as strong as its least verified dataset. However, by acknowledging the constant-threshold vulnerability and implementing proactive defense mechanisms, stakeholders can work to protect the integrity of the systems that society increasingly depends on.

Code Repository

The code used for data preparation, poison generation, model fine-tuning, and evaluation is available at:

https://github.com/mannanxanand/DAT-490-Capstone

Key Technologies

  • Models: Llama-3.2 (1B and 3B parameters)
  • Datasets: CaseHOLD, Common Pile, The Pile
  • Optimization: Unsloth, 4-bit quantization
  • Poison Generation: Gemini 2.5 Flash
  • Infrastructure: AWS g4dn.xlarge, A100 instances
  • Format: Alpaca Instruction Format