MACHINE LEARNING APPLICATIONS IN SPORTS: IDENTIFYING KEY BIOMOLECULES INFLUENCED BY EXERCISE
Keywords:
Biomarkers; Machine Learning; Weighted Gene Co-expression Network Analysis.Abstract
Background: It is well known that exercise is beneficial, but the molecular characteristics associated with exercise have not been defined. This study is designed to identify the molecules associated with exercise. Methods: In this study, we utilized the exercise pre-training and post-training vastus lateralis muscle expression dataset GSE97084 provided by the Gene Expression Omnibus (GEO). The first step involved normalizing the GSE97084 dataset and detecting differentially expressed genes (DEGs). Through the “limma” package, genes within the identified modules underwent differential expression analysis. To further comprehend the biological significance of these differential genes, Gene Set Enrichment Analysis (GSEA) was applied, identifying biological processes and pathways highly correlated with exercise. Subsequently, we employed Weighted Gene Co-expression Network Analysis (WGCNA) techniques to identify modules most correlated with pre- and post-exercise states. After identifying relevant genes and pathways, machine learning techniques were utilized to select key genes, and deep learning methods were employed to establish a predictive model, thereby verifying the model's effectiveness and training accuracy. This comprehensive analysis approach offers new molecular mechanism insights into the impact of exercise and may reveal key regulatory factors responsive to exercise. Results: The findings indicate that exercise training significantly affects the expression of genes related to immune pathways. GSEA analysis revealed the activation of several immune-related biological pathways post-training, such as "Chemokine signaling pathway", "Complement and coagulation cascades", "Fc gamma R−mediated phagocytosis", "Hematopoietic cell lineage" and "Leukocyte trans endothelial migration". Conversely, pathways like "Insulin signaling pathway" were inhibited, suggesting a reduction in their activity post-training. In constructing co-expression networks in all samples, WGCNA analysis identified a black module correlated with exercise. Finally, a variety of machine learning identified 10 key genes, and deep learning validated 10 key genes as good distinguishing features between the post-training and pre-training groups, potentially playing critical roles in regulating biological processes. These findings contribute to a deeper understanding of the regulatory mechanisms of gene expression and metabolic pathways due to exercise training, providing important clues for research into exercise physiology and metabolic health. Conclusion: This study discovered that exercise training significantly impacts immune-related pathways, biological processes, and metabolic pathways, offering crucial insights into further understanding the benefits of exercise on health. 10 exercise-related genes identified by machine learning may be key molecules in the health effects of exercise.