Experiences in the application of data mining in the Cuban biopharmaceutical industry

  • Osvaldo Gozá-León Facultad Ingeniería Química, Universidad Tecnológica de La Habana (CUJAE), Cuba https://orcid.org/0000-0002-1426-5910
  • Arturo Toledo-Rivero Centro de Inmunología Molecular (CIM), La Habana, Cuba
Keywords: principal component analysis; fermentation; data mining; chromatographic purification;neural networks.


The Center of Molecular Immunology is an exponent of Cuban biotechnology institution dedicated to basic research, development, production and commercialization of biopharmaceuticals, with the aim of diagnosing and treating cancer and diseases related to the immune system. In this work, a bibliographic synthesis of research that was carried out in this center in the period from 2012 to 2018, was performed, with the purpose of analyzing the application of some Data Mining techniques in the evaluation of fermentation and purification stages of the process for obtaining biopharmaceuticals in three production facilities. The phases of Data Mining were characterized and the practical implication of the results was presented. Principal Components Analysis was applied as a descriptive model using THE UNSCRAMBLER software, and Artificial Neural Networks were applied as a predictive model using the MATLAB neural networks toolbox. The use of these models made it possible to extract useful information for decision making, thereby explaining the behavior of the parameters that influence the quality of the final product, as well as estimating important variables such as the concentration of the protein of interest in the fermentation supernatant and the performance of the purification stage, depending on the process variables that have the greatest influence on its behavior.


1. TEIXEIRA, A. P., OLIVEIRA, R., ALVES, P. M. AND CARRONDO, M. J. T. Advances in on-line monitoring and control of mammalian cell cultures: Supporting the PAT initiative. Biotechnology Advances, [en línea]. 2009, 27, 726-732. DOI: 10.1016/j.biotechadv.2009.05.003
2. TROUP, G. M., Georgakis, C. Process systems engineering tools in the pharmaceutical industry. Comput. Chem. Eng., [en línea]. 2013, 51, 157-171.
3. MARTÍNEZ DE PISÓN, F. Optimización mediante técnicas de minería de datos del ciclo de recocido de una línea de galvanizado. Tesis de Doctorado. J. B. Ordieres Meré (dir.). Universidad de La Rioja, Logroño, España, 2003.
4. GE, Z.I., SONG, Z., DING, S. X., HUANG, B. Data Mining and Analytics in the Process Industry: The Role of Machine Learning. IEEE Access. 2017, 5, 20590-20616. DOI: 10.1109/ACCESS.2017.2756872
5. XU, S., Lu, Bo., Baldea, M., Edgar, T.F., Wojsznis, Willy., Blevins, T., Nixon, M. Data cleaning in the process industries. Rev. Chem. Eng, [en línea]. 2015, 31, 453–490. DOI: 10.1515/revce-2015-0022
6. ABDI, H., WILLIAMS, L.J. Principal Component Analysis. Rev. Comp. Stat., [en línea]. 2010, 2, 433–459. DOI: 10.1002/wics.101
7. RODIONOVA, O., KUCHERYAVSKIY, S., POMERANTSEV, A. Efficient tools for principal component analysis of complex data - a tutorial. Chemom. Intell. Lab. Syst., [en línea]. 2021, 213. DOI: 10.1016/j.chemolab.2021.104304
8. MESA, L., GOZÁ, O., URANGA, M., TOLEDO, A., GÁLVEZ, Y. Aplicación del Análisis de Componentes Principales en el proceso de fermentación de un anticuerpo monoclonal. VacciMonitor., [en línea]. 2018. 27, 8-15.
9. GOZÁ, O., Fernández, M., Rodríguez, R.H., Ojito, E. Aplicación del Análisis de Componentes Principales en el proceso de purificación de un biofármaco. VacciMonitor, [en línea]. 2020. 29, 5-13.
10. TOLEDO, A., GOZÁ, O., HERNÁNDEZ, E., LEONARD, I., HIDALGO, G. A continued process verification strategy at first stages of monoclonal antibody purification by integrated risk assessment and multivariate data analysis. Biotecnol. Apl., [en línea]. 2021, 38,1201-8. ISSN 1027-2852.
11. HERNÁNDEZ, E., CALZADILLA, L., TOLEDO, A., GOZÁ, O., PIETZKE, M., VAZQUEZ, A., RODRÍGUEZ, G., QUINTANA, A., LEON, K., BOGGIANO., T. Determination of Best Nutritional Conditions for a Monoclonal Antibody-Producing Cell Line based on a Multivariate Data Analysis Approach. Glob. J. Eng. Tech.: J General Engineering, [en línea]. 2023, 23(1). ISSN: 2583-3359.
12. RODRÍGUEZ, R.H., Gozá, O., Ojito, E. Preliminary modeling of an industrial recombinant human erythropoietin purification process by artificial neural networks. Braz. J. Chem. Eng., [en línea]. 2015, 32, 725-734. ISSN 1678-4383. DOI: 10.1590/0104-6632.20150323s00003527
How to Cite
Gozá-León, O., & Toledo-Rivero, A. (2024). Experiences in the application of data mining in the Cuban biopharmaceutical industry. Chemical Technology, 44(1), 164-179. Retrieved from https://tecnologiaquimica.uo.edu.cu/index.php/tq/article/view/5399