Preview

Digital Law Journal

Advanced search

Using personal data in AI model training under EU law

https://doi.org/10.38044/2686-9136-2025-6-12

Abstract

The adoption of the EU Artificial Intelligence Act (AI Act) established mandatory life-cycle regulation of AI systems in the European Union while preserving the validity of the General Data Protection Regulation (GDPR). The training stage of AI models has consequently become a point of intersection between two regulatory regimes: while the AI Act emphasizes data quality and representativeness along with risk management and documentation of training processes, the GDPR sets out the applicable principles of lawfulness, data minimization, purpose, and storage limitation, as well as providing data subjects with a set of safeguards and remedies. In practical terms, this interaction creates a risk of legally defective model training due to the pursuit of representativeness through excessive data collection and repeated re-use of personal data. This article examines the permissibility and organization of AI model training under the joint application of the AI Act and the GDPR. The research sets out to substantiate a legal model that enables proportionate technical and organizational safeguards while preserving training quality and ensuring the lawfulness of personal data processing that respects the fundamental rights of data subjects. As well as combining doctrinal legal analysis of the AI Act requirements on risk management and data governance with a comparative assessment of the GDPR principles and procedural tools for ensuring lawful processing, the methodology involves a systematization of typical governance artefacts used in the development and deployment of high-risk AI systems. The results are presented as an integrated compliance-by-design model for actors involved in the training stage. A practical distinction between an “AI system” and an “AI model” is substantiated: whereas an AI system is qualified as an organizational and technical envelope comprising the model, infrastructure, input and output interfaces, monitoring, and human interaction, an AI model is treated as the algorithmic core trained on data and used to infer outputs. This distinction can be applied to allocate obligations between the provider and entities deploying or operating the system. The proposed mechanism for reconciling dataset representativeness and accuracy with the GDPR data minimization principle through a documented feature inventory is based on a necessity rationale for each class of data and the exclusion of irrelevant attributes alongside an assessment of indirect discrimination risks. The choice of safeguards (pseudonymization, anonymization, aggregation, synthetic generation, and differential privacy) to data sensitivity, use context, and the level of risk to fundamental rights is carried out on the basis of a proportionality model. This model is supported by the outcomes of a data protection impact assessment and a fundamental rights impact assessment. Finally, a practical legal governance loop for the training life cycle is formulated to cover the determination of the purpose and legal basis, limits on dataset re-use, access control and logging, as well as retention and deletion rules, along with procedures for revisiting training parameters and monitoring after deployment. The proposed model increases legal certainty and provides a reproducible framework for aligning the AI Act and GDPR during the training stage.

About the Author

A. A. Olifirenko
Saratov State Law Academy; Yuri Gagarin State Technical University of Saratov
Russian Federation

Artem A. Olifirenko — Master’s student, Department of Information Law and Digital Technologies; Master’s student, Department of Information Security of Automated Systems, Institute of Electronic Engineering and Instrumentation; Data Protection Specialist, responsible for AI governance and security, “Ecosystem Real Estate ‘Metr Kvadratny’” LLC, Moscow, Russia.

1, Volskaya st., Saratov, Russia, 410056;

77-1, Polytechnicheskaya st., Saratov, Russia, 410008



References

1. Arasteh, S. T., Ziller, A., Kuhl, C., Makowski, M. R., Nebelung, S., Braren, R. F., Rueckert, D., Truhn, D., & Kaissis, G. (2024). Preserving fairness and diagnostic accuracy in private large-scale AI models for medical imaging. Communications Medicine, (4), Article 46 (2024). https://doi.org/10.1038/s43856-024-00462-6

2. Basdekis, I., Kloukinas, C., Agostinho, C., Vezakis, I., Pimenta, A., Gallo, L., & Spanoudakis, G. (2023). Pseudonymisation in the context of GDPR-compliant medical research. In 2023 19th International Conference on the Design of Reliable Communication Networks (DRCN) (pp. 1–6). Ieee. https://doi.org/10.1109/DRCn57075.2023.10108370

3. De Hert, P., & Hajduk, P. (2024). EU cross-regime enforcement, redundancy and interdependence: Addressing overlap of enforcement structures in the digital sphere after Meta. Technology and Regulation, 2024, 291–308. https://doi.org/10.71265/fydwsg59

4. De Silva, D., & Alahakoon, D. (2022). An artificial intelligence life cycle: from conception to production. Patterns, 3(6), Article 100489. https://doi.org/10.1016/j.patter.2022.100489

5. Dwork, C., Mcsherry, F., Nissim, K., & Smith, A. (2006). Calibrating noise to sensitivity in private data analysis. In s. Halevi & t. Rabin (eds.), Lecture notes in computer science, Vol. 3876. Theory of cryptography (pp. 265–284). springer. https://doi.org/10.1007/11681878_14

6. Finck, M., & Biega, A. (2021). Reviving purpose limitation and data minimisation in data-driven systems. Technology and Regulation, 2021, 44–61. https://doi.org/10.26116/techreg.2021.004

7. Hadwick, D. (2024). Slipping through the cracks, the carve-outs for AI tax enforcement systems in the EU AI Act. European Papers: A Journal on Law and Integration, 9(3), 936–955. https://doi.org/10.15166/2499-8249/793

8. Haripriya, R., Khare, N., & Pandey, M. (2025). Privacy-preserving federated learning for collaborative medical data mining in multi-institutional settings. Scientific Reports, (15), Article 12482 (2025). https://doi.org/10.1038/s41598-025-97565-4

9. Kaminski, M. E., & Malgieri, G. (2025). Impacted stakeholder participation in AI and data governance. Yale Journal of Law & Technology, 27(1), 247–326.

10. Kindt, E. J. (2025). EU biometric data regulation: Part 2: the AI Act. IEEE Biometrics Council Newsletter, 54, 30–41.

11. Laato, S., Birkstedt, T., Mäntymäki, M., Minkkinen, M., & Mikkonen, T. (2022). AI governance in the system development life cycle: Insights on responsible machine learning engineering. In Proceedings of the 1st International Conference on AI Engineering: Software Engineering for AI (pp. 113–123). ACM. https://doi.org/10.1145/3522664.3528598

12. Liu, W., Zhang, Y., Yang, H., & Meng, Q. (2024). A survey on differential privacy for medical data analysis. Annals of Data Science, 11, 733–747. https://doi.org/10.1007/s40745-023-00475-3

13. Malgieri, G., & Santos, C. (2025). Assessing the (severity of) impacts on fundamental rights. Computer Law & Security Review, 56, Article 106113. https://doi.org/10.1016/j.clsr.2025.106113

14. Mantelero, A. (2024). The fundamental Rights Impact Assessment (fRIA) in the AI Act: Roots, legal obligations and key elements for a model template. Computer Law & Security Review, 54, Article 106020. https://doi.org/10.1016/j.clsr.2024.106020

15. Menges, F., Latzo, T., Vielberth, M., Sobola, S., Pöhls, H. C., Taubmann, B., Köstler, J., Puchta, A., Freiling, F., Reiser, H. P., & Pernul, G. (2021). towards GDPR-compliant data processing in modern sIeM systems. Computers & Security, 103, Article 102165. https://doi.org/10.1016/j.cose.2020.102165

16. Novelli, C., Casolari, F., Hacker, P., Spedicato, G., & Floridi, L. (2024a). Generative AI in EU law: liability, privacy, intellectual property, and cybersecurity. Computer Law & Security Review, 55, Article 106066. https://doi.org/10.1016/j.clsr.2024.106066

17. Novelli, C., Casolari, F., Rotolo, A., Taddeo, M., & Floridi, L. (2024b). AI risk assessment: A scenario-based, proportional methodology for the AI Act. Digital Society, 3, Article 13. https://doi.org/10.1007/s44206-024-00095-1

18. Paullada, A., Raji, I. D., Bender, E. M., Denton, E., & Hanna, A. (2021). Data and its (dis)contents: A survey of dataset development and use in machine learning research. Patterns, 2(11), Article 100336. https://doi.org/10.1016/j.patter.2021.100336

19. Rocks, J. W., & Mehta, P. (2022). Memorizing without overfitting: Bias, variance, and interpolation in over-parameterized models. Physical Review Research, 4(1), Article 013201. https://doi.org/10.1103/PhysRevResearch.4.013201

20. Slijepčević, D., Henzl, M., Klausner, L. D., Dam, T., Kieseberg, P., & Zeppelzauer, M. (2021). K-Anonymity in practice: How generalisation and suppression affect machine learning classifiers. Computers & Security, 111, Article 102488. https://doi.org/10.1016/j.cose.2021.102488

21. Sovrano, F., Hine, E., & Anzolut, S. (2025). Simplifying software compliance: AI technologies in drafting technical documentation for the AI Act. Empirical Software Engineering, 30, Article 91. https://doi.org/10.1007/s10664-025-10645-x

22. Söderlund, K., & Larsson, S. (2024). Enforcement Design Patterns in EU law: An Analysis of the AI Act. Digital Society, 3, Article 41. https://doi.org/10.1007/s44206-024-00129-8

23. Van Bekkum, M. (2025). Using sensitive data to de-bias AI systems: Article 10(5) of the EU AI Act. Computer Law & Security Review, 56, Article 106115. https://doi.org/10.1016/j.clsr.2025.106115

24. Van Bekkum, M., & Zuiderveen Borgesius, F. (2023). Using sensitive data to prevent discrimination by artificial intelligence: Does the GDPR need a new exception? Computer Law & Security Review, 48, Article 105770. https://doi.org/10.1016/j.clsr.2022.105770

25. Veltmeijer, E., & Gerritsen, C. (2025). Legal and ethical implications of AI-based crowd analysis: the AI Act and beyond. AI and Ethics, 5, 3173–3183. https://doi.org/10.1007/s43681-024-00644-x

26. Wang, L., Liu, Z., Liu, A., & Tao, F. (2021). Artificial intelligence in product lifecycle management. The International Journal of Advanced Manufacturing Technology, 114, 771–796. https://doi.org/10.1007/s00170-021-06882-1

27. Winau, M. (2023). On the lack of substantive balancing and coordinated legal concretisation in the european Commission’s proposal for a regulation on AI. European Data Protection Law Review, 9(2), 123–135. https://doi.org/10.21552/edpl/2023/2/7


Review

Views: 87

JATS XML


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2686-9136 (Online)