Reliability and validity of applying Kirkpatrick model for evaluating exercise rehabilitation program

Article information

J Exerc Rehabil Vol. 21, No. 4, 200-209, August, 2025
Publication date (electronic) : 2025 August 31
doi : https://doi.org/10.12965/jer.2550428.214
1Research Institute of Sports and Industry Science, Hanseo University, Seosan, Korea
2Department of Physical Education, Korea University, Seoul, Korea
*Corresponding author: Ji-Eun Yu, Research Institute of Sports and Industry Science, Hanseo University, 46 Hanseo 1-ro, Haemi-myeon, Seosan 31962, Korea, Email: jieun4865@hanseo.ac.kr
Received 2025 June 19; Revised 2025 July 14; Accepted 2025 July 20.

Abstract

This study aimed to evaluate the reliability and validity of the Kirkpatrick four-level questionnaire—comprising reaction (R), learning (L), behavior (B), and results (Res)—in the context of an exercise rehabilitation program. A total of 141 university students, aged 21 to 25, participated in a 15-week exercise rehabilitation program. Intrarater reliability was assessed using the intraclass correlation coefficient (ICC) in a pre-posttest setup and analyzed through Bland–Altman plots. Convergent validity was examined using Pearson correlation. The results of this study were as follows. ‘R,’ consisting of six questions, showed an ICC ranging from 0.954 to 0.990. ‘L,’ composed of seven questions, demonstrated an ICC between 0.883 and 0.978. ‘B,’ made up of five questions, displayed an ICC ranging from 0.859 to 0.974. Additionally, ‘Res,’ consisting of five questions, showed an ICC between 0.834 and 0.926. Significant correlations were observed among all 23 items (P<0.001), suggesting that the application of the Kirkpatrick model to evaluate the exercise rehabilitation program demonstrates strong reliability and validity. These findings suggest that the evaluation metrics can effectively monitor program outcomes at each level through the application of the Kirkpatrick model.

INTRODUCTION

Participating in university-level exercise rehabilitation program provides high-quality education, offers appropriate knowledge and cultivates a healthy body (Zhou et al., 2019). As institutions of higher education, universities in Korea aim to develop individuals with both broad knowledge and specialized expertise. Rehabilitation-focused physical activity is a crucial aspect that should be emphasized in fostering such well-rounded individuals. University students’ participation in structured exercise programs influences their social development and contributes to the acquisition of physical fitness, mental resilience, and motor skills (García-Hermoso et al., 2020). It also aids in the development of character and values, promotes relieves stress and enhances mutual understanding and relationships. This curriculum extends beyond a basic understanding of the physical rehabilitation process by enhancing students’ awareness of patients and fostering a deeper appreciation for the value of human dignity.

Helping patients improve their physical abilities promotes the healthy expression and maintenance of their intellectual and psychological capacities. In other words, while exercise-based rehabilitation programs represent only a small part of a broad spectrum, the students involved in these programs are future leaders and contributors to society. Therefore, the importance of education in shaping their development is strongly emphasized (Billinger et al., 2014). However, although the exercise rehabilitation program as a general education course is widely regarded as highly beneficial for students’ physical and mental development, student evaluation methods remain somewhat ambiguous.

To date, numerous educational evaluation models have been proposed in academia. Among them, the Kirkpatrick model is a program evaluation format that utilizes four levels (Johnston et al., 2018). Through structured learning and subsequent activities, it facilitates competency building through the transfer of knowledge, skills, or attitudes (Johnston et al., 2018). Primarily applied in the context of internship programs, the Kirkpatrick model serves as an evaluation tool when prospective teachers undergo direct field training at schools to achieve predetermined learning outcomes (Frye and Hemmer, 2012). Additionally, a series of items related to internship program evaluation scales were provided to prospective teachers to review program performance (Johnston and Fox, 2020). While the Kirkpatrick model appears to partially fulfil aspects related to the evaluation of general exercise rehabilitation program pursued in this study, research that applies this model specifically to general rehabilitative exercise settings is needed.

The Kirkpatrick evaluation model comprises four levels: reaction, learning, behavior, and results. Reaction and learning levels are considered internal factors because they focus on processes occurring within a specific program. In contrast, behavior and results levels are considered external factors because they concentrate on changes occurring after the completion of the program (Johnston and Fox, 2020). These four levels of evaluation seem highly beneficial because they allow for the identification of the strengths and weaknesses of general exercise rehabilitation program classes. Nevertheless, minimal research has applied Kirkpatrick’s four-level evaluation specifically to university-level exercise rehabilitation program, and the reliability and validity of Kirkpatrick’s four-level framework have yet to be examined. This study aimed to establish the reliability and validity of evaluation measures for the exercise rehabilitation program based on the Kirkpatrick model.

MATERIALS AND METHODS

Participants and research design

Prior to the start of the study, male and female students aged 21 to 25 who were enrolled in an exercise rehabilitation program course were invited to participate on a voluntary basis. Students from four universities were approached. The inclusion criteria required participants to attend the 15-week exercise rehabilitation course without any absences and to be able to take part in both the theoretical lectures and the practical rehabilitation training sessions. To apply Kirkpatrick’s scale to the rehabilitation setting, all participants completed an adapted survey twice, with a 1-week interval between each measurement. All participants with visual, hearing, and/or physical disabilities, those unable to complete the measurements, or those who withdrew from the assessment were excluded from the study. Participants who had received treatments affecting physical condition, undergone major surgery within one year prior to the study, or had a history of coronary artery disease, cerebrovascular disease, significant organ impairment, uncontrolled hypertension, cancer, or psychiatric disorders were also excluded (Yu and Jee, 2020).

Initially, 212 candidates were deemed eligible based on these criteria. However, 71 candidates were excluded due to incomplete records or missing data in the structured questionnaires. Consequently, the records of the remaining 141 candidates were analyzed. The minimum sample size was calculated to be 138 using the G*power calculator for the correlation (bivariate normal) model with a desired power level of 0.95, a probability level of 0.05, and a correlation ρ H1 of 0.3 (Wolf et al., 2013). The research sample was determined through total sampling, meaning the entire population was included as the sample.

This study was approved by the Human Studies Committee at the College of Health Science at Hanseo University (HS22-12-01). All participants were required to read and sign an informed consent form before participating in the study.

Body composition measures

The body composition of the participants was assessed using bioelectrical impedance analysis with a body composition analyzer (InBody 230, BioSpace), which operates through stainless steel electrode interfaces. Height was measured using an analogue height-measuring device (Samwha, Korea). Body composition analysis was conducted before participants had dinner and after they had emptied their bladders (Yu and Jee, 2020).

Kirkpatrick’s scale measures

This single-blind, randomised controlled trial was conducted from 4 March to 14 June 2024, with data analysis performed at the Hanseo University Research Centre. The initial assessment began on 8 March with an online survey, followed by a course titled “Exercise Rehabilitation Practice,” which ran until 14 June. On the final day, the same questionnaire survey used at the start was administered again. This study modified the reaction, learning, behavior, and results levels from Kirkpatrick’s four-level evaluation model, previously used in rehabilitation program evaluations (Quinton et al., 2022), to fit the context of exercise rehabilitation settings. The study aimed to develop reliable theories and assumptions regarding the relationships between variables, including reaction, learning, behavior, and results, in order to determine the reliability and validity of the questionnaire.

The measurement scale for evaluating the exercise rehabilitation program was adapted from various relevant sources in the field of education, as shown in Table 1. This structured approach ensures that both internal (reaction and learning) and external (behavior and results) aspects are comprehensively evaluated, providing a robust framework for assessing the effectiveness of the rehabilitation program. The reaction, learning, behavior, and results levels consist of six, seven, five, and five items, respectively, to test the rehabilitation program evaluation scale. These four levels comprehensively describe developmental achievements at each level and allow challenges and problems to be rapidly evaluated (Quinton et al., 2022). The measurement scale of the exercise rehabilitation program evaluation provides this information, enabling stakeholders on campus to effectively assess rehabilitation-based programs and make recommendations. This comprehensive approach reveals the success or failure of pre-service students in implementing the exercise rehabilitation program.

Kirkpatrick’s evaluation instrument for the exercise rehabilitation program

Additionally, the questionnaire items were tailored to the exercise rehabilitation program conducted on campus, considering the program’s objectives and evaluation criteria. The measurement scale employs a differential semantic scale (points 1–6) in the form of a bipolar continuum (Morgan, 1984). This scale in the questionnaire includes three dimensions: the evaluation dimension (poor-good), the potential dimension (weak-strong), and the activity dimension (passive-active), as shown in Table 2.

Dimensions of Kirkpatrick’s scale

Each dimension was adapted for its specific use by considering the context of each questionnaire item’s statement. This study adapted Kirkpatrick’s four-level evaluation model to assess the exercise rehabilitation program, tailoring it to the specific context of rehabilitation settings. This model includes the following levels: “Reaction” reflects participants’ responses to the rehabilitation program and is measured through questions such as “How satisfied are you with the exercise therapy?” Responses were rated on a 6-point scale ranging from 1 (“very dissatisfied”) to 6 (“very satisfied”). “Learning” assesses improvements in knowledge or skills. “Behavior” evaluates the extent to which the acquired knowledge and skills are applied. “Results” measure the broader impact of the rehabilitation program through questions like “Do you feel that the patient’s health will improve after participating in the rehabilitation program?” Responses were rated on a 6-point scale from 1 (“not likely to improve”) to 6 (“very likely to improve”). Cronbach alpha, which indicates the reliability of this questionnaire, was 0.958.

Data collection and procedure

The participants engaged intensively and proactively, facing various challenges throughout the one-semester program. This study tracked students’ progress through monthly rehabilitation session reports progress reports and provided them with theoretical and practical instruction to help solve problems they encountered in the exercise rehabilitation program. One week after the rehabilitation program, each questionnaire was distributed to all participants via Google Forms. Data from students who had completed the rehabilitation program were collected as a one-time procedure, with all participants completing the questionnaire. The data, initially in ordinal form, were transformed into interval data using the successive interval method.

Data analysis

Data were analyzed using IBM SPSS Statistics ver. 25.0 (IBM Co., USA), with a significance level set at P≤0.05. Descriptive statistics, including the mean and standard deviation, were used to present the data. The normality of the distribution for the variables was determined using the Kolmogorov–Smirnov test prior to analysis. Reliability was assessed with the intraclass correlation coefficient (ICC) through a pre-post-test comparison and analyzed using Bland–Altman plots. Pearson correlation was used to assess convergent validity, conducted between the test and retest. Meanwhile, to visually present construct validity more clearly, a correlation matrix was created using GraphPad Prism 10.2.3 (GraphPad Software Inc., USA). For reference, a correlation coefficient is considered excellent, good, moderate, or poor if it is over 0.90, between 0.75 and 0.90, between 0.50 and 0.75, and below 0.50, respectively, according to Koo and Li (2016).

RESULTS

Demographic features

As shown in Table 3, the participants were nearly evenly split between males (51.1%) and females (49.9%). A total of 141 students who completed the rehabilitation program participated in the study.

Demographics of the participants

ICC and confidence interval on reaction

Detailed statistics and ICC results for all pretests and posttests are provided in Table 4. For reaction (R)1, an ICC of 0.980 (95% confidence interval [CI], 0.973–0.986) was obtained. Similarly, ICC values from R2 to R6 were also high, with the 95% CI indicating strong reliability. The Bland–Altman graph indicated limits of agreement with 2 standard deviations (SDs) of 0.31 and −0.26, with a mean difference of 0.02±0.14 in the R mean value (Fig. 1A).

Descriptive statistics and ICC of reaction of total participants (n=141)

Fig. 1

Bland–Altman graph analyses of Kirkpatrick’s four-level evaluation. Reaction (A), learning (B), behavior (C), and result (D), respectively.

ICC and CI on learning

Detailed statistics and ICC results for all pretests and posttests related to learning are shown in Table 5. For learning (L)1, an ICC of 0.946 (95% CI, 0.925–0.961) was achieved. Similarly, the ICC values from L2 to L7 were high, with 95% CI remaining within a similarly strong range. The Bland–Altman graph indicated limits of agreement with 2 SDs of 0.36 and −0.30, with a mean difference of 0.03±0.17 in the learning mean value (Fig. 1B).

Descriptive statistics and ICC of learning of total participants (n=141)

ICC and CI on behavior

Table 6 presents comprehensive statistics and ICC results for all pretests and posttests related to behavior. For behavior (B)1, an ICC of 0.974 was observed, with a 95% CI ranging from 0.964 to 0.982. The ICC values for behavior measures B2 through B5 were similarly high, with their 95% CI within a similarly elevated range. The Bland–Altman graph indicated limits of agreement with 2 SDs of 3.37 and −3.34, with a mean difference of 0.02±1.71 in the behavior mean value (Fig. 1C).

Descriptive statistics and ICC of behavior of total participants (n=141)

ICC and CI on results

As Table 7, the ICC for results (Res)1 was 0.926 (95% CI, 0.897–0.947). Similarly, the ICC values for results measures Res2 through Res5 were also high, with their 95% CI within a similarly strong range. The Bland–Altman graph indicated limits of agreement with 2 SDs of 0.43 and −0.40, with a mean difference of 0.02± 0.21 in the result mean value (Fig. 1D).

Descriptive statistics and ICC of results of total participants (n=141)

Relationships among factors of Kirkpatrick’s evaluation model

As shown in Table 8, the four levels of the Kirkpatrick evaluation scale demonstrated statistically significant correlations across different variables, indicating the validity of the four-level Kirkpatrick evaluation scale for assessing exercise rehabilitation class. This validity was further confirmed by the correlation matrix of multiple variables across each row and column of Kirkpatrick’s evaluation model (Fig. 2).

Correlation coefficients among the Kirkpatrick’s four evaluation factors

Fig. 2

Correlation matrix of multiple variables across each row and column of Kirkpatrick’s evaluation scale. In A, a deeper shade of blue indicates a higher positive correlation, while a shift towards red hues suggests a negative correlation. Conversely, in panel B, a deeper shade of green signifies a higher positive correlation, whereas a shift towards yellow hues indicates a negative correlation. R, reaction; L, learning; B, behavior; Res, result.

DISCUSSION

Most studies related to exercise rehabilitation education tend to rely on satisfaction-based tools associated with educational satisfaction, and there is a significant lack of tools aimed at systematically evaluating students’ reactions, learning, behavior, and outcomes. This research represents the first attempt to apply the Kirkpatrick assessment tool to an exercise rehabilitation setting and has found, as indicated by the results, that it possesses both reliability and validity. The reason for utilizing Kirkpatrick’s four-level evaluation model for exercise rehabilitation education is its widespread use in research for assessing educational programs both domestically and internationally. It comprises the core concepts and content of ‘outcome-based evaluation,’ which analyses how well an educational program achieves its intended goals and objectives. The findings in this article complement previous research on developing a measurement scale for exercise rehabilitation program evaluations (García-Hermoso et al., 2020; Gracia et al., 2021). This study’s measurement scale of the Kirkpatrick exercise rehabilitation evaluation model also reveals internal aspects of students’ identities (reaction and learning levels). Furthermore, this study comprehensively examines the external aspects (behavior and outcomes levels) as an evaluation construct of the Kirkpatrick model. This study found that the Kirkpatrick model is reliable and valid for evaluating exercise rehabilitation program after participating in rehabilitation training course.

In this context, Smidt et al. (2009) noted that when designing programs that require significant time and cost, it is effective to evaluate the evidence supporting the program. The efficiency of clinical practice is measured through evidence-based practice, and despite the annual development of program, training program evaluations have been inadequate. The Kirkpatrick model offers a technique for assessing the evidence of reported educational programs and can be used to evaluate whether an educational program meets the needs and requirements of both the organization implementing the education and the participating staff. Heydari et al. (2019) used Kirkpatrick’s model to measure the effectiveness of a workshop on new teaching and learning methods for medical staff. They reported that Kirkpatrick’s program evaluation model significantly improved medical staff’s satisfaction with the workshop’s educational environment, their knowledge of new teaching and learning methods, and their behavior when conducting educational workshops. Tahmasebi et al. (2020) found that, based on the Kirkpatrick model, the education and presence of clinical instructors during clinical rounds improved satisfaction, attitudes, knowledge, and information retrieval skills. This also led to enhanced information retrieval behavior and clinical skills among medical students.

Several researchers stated that the Kirkpatrick model remains consistently useful, appropriate, and applicable across various contexts. They highlighted its adaptability to diverse training environments and its capability to achieve high performance in training evaluations. Additionally, they provided an overview of publications on the Kirkpatrick model, indicating that research using this model is a thriving and growing field. They also reported that evaluations in medical education as well as in computer science, business, and social sciences can benefit from its application (Heydari et al., 2019; Smidt et al., 2009; Tahmasebi et al., 2020). Previous research focused on constructing a scale to measure the effectiveness and impact of exercise rehabilitation education on students’ learning outcomes (Yu et al., 2022). While these studies successfully developed a measurement scale using the Kirkpatrick model, the findings of this study offer a unique perspective, particularly concerning students’ activities. In most exercise rehabilitation courses in Korea, whether they are practical, theoretical, or a combination of both, evaluations have primarily focused on learning, behaviors, or learning outcomes. This research explored various educational evaluation models and found that the Kirkpatrick model, previously applied to internship program, reasonably evaluates programs based on the structure of reactions, learning, behavior, and outcomes. This study aimed to determine if the four-level evaluation structure could be reliably and validly applied to physical education classes. As indicated by the findings, the model demonstrated high reliability and validity for this purpose. In other words, the measurement scale in this study emphasizes the operational activities of students at each level, aligning with the exercise rehabilitation objectives. This comprehensive measurement scale reflects student activities during the exercise rehabilitation program, ensuring relevant and credible program evaluations. This study’s findings align with those of previous research, confirming the applicability of the four-level model to exercise rehabilitation evaluation (Piryani et al., 2018). This research complements the context of exercise rehabilitation within the Korean School Program and introduces novelty by applying Kirkpatrick’s theoretical framework to exercise rehabilitation programs offered at Korean universities or colleges.

The study used Kirkpatrick’s four-level evaluation model to develop a program evaluation measurement scale, considering the levels of reaction, learning, behavior, and outcomes. The results of construct validity testing indicate that the Kirkpatrick model is effective for evaluating exercise rehabilitation programs. Relevant studies confirm that developing such scales involves systematic and complex procedures requiring methodological rigor (Quinton et al., 2022). A series of validity and reliability tests conducted in this study, including internal consistency, convergent validity, and discriminant validity, demonstrated that the exercise rehabilitation education evaluation scale could be used credibly and comprehensively. The students’ perceptions of the services, mentoring, and reporting provided by their tutors during exercise rehabilitation programs showed that the learning level had a higher outer loading value than other levels. Thus, the learning construct is the most significant for students during their exercise rehabilitation program. Internally, this information suggests that the learning process in the exercise rehabilitation program has successfully enhanced the students’ understanding of the learning paradigms they apply in classroom activities. This improvement also extends to their learning skills, classroom management, understanding of academic culture, and knowledge of school organization (Winstein et al., 2014). Externally, when evaluating the changes induced by the exercise rehabilitation program, the behavior level was found to be more crucial than other levels (Tahmasebi et al., 2020). At this level, students experienced significant changes in their actions within the learning process post-exercise rehabilitation program. These changes included enhanced communication and behavior with supervisors and tutors, improved academic performance, and increased ability to interact with students, all of which are essential for becoming effective teachers (Fenwick-Smith et al., 2018). The reliability and validity tests of this study ensure that the measurement scale can effectively and comprehensively evaluate students’ exercise rehabilitation programs. However, this research has some limitations because it focused solely on the students’ perspectives within the reaction-learning-behavior-results framework. It did not explore the perceptions of lecturers and teachers regarding students’ performance during the exercise rehabilitation program. Additionally, the study is specific to the Field School Program in Korea, meaning that the exercise rehabilitation instrument may not be directly applicable to other exercise rehabilitation programs in higher education, such as those offered by companies, institutions, or laboratories. Modifications would be necessary for use in higher education institutions in other countries.

This study successfully demonstrated the validity and reliability of a measurement scale for exercise rehabilitation program adapted from Kirkpatrick’s evaluation model. Campuses can use it to measure the success of students in exercise rehabilitation program based on reaction, learning, behavior, and results levels. The research findings have significant implications for Korean campuses that conduct exercise rehabilitation programs. The measurement scale and findings contribute valuable insights for the development of psychometric science and provide educational scholars with a novel approach that can be applied in future research.

Notes

CONFLICT OF INTEREST

No potential conflict of interest relevant to this article was reported.

ACKNOWLEDGMENTS

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2022S1A5B5A16056492).

References

Billinger, S.A., Arena, R., Bernhardt, J., Eng, J.J., Franklin, B.A., Johnson, C.M., MacKay-Lyons, M., Macko, R.F., Mead, G.E., Roth, E.J., Shaughnessy, M., Tang, A.American Heart Association Stroke Council; Council on Cardiovascular and Stroke Nursing; Council on Lifestyle and Cardiometabolic Health; Council on Epidemiology and Prevention; Council on Clinical Cardiology. Physical activity and exercise recommendations for stroke survivors: a statement for healthcare professionals from the American Heart Association/American Stroke Association. Stroke, (2014). 45, 2532–2553.
Fenwick-Smith, A., Dahlberg, E.E., & Thompson, S.C.Systematic review of resilience-enhancing, universal, primary school-based mental health promotion programs. BMC Psychol, (2018). 6, 30.
Frye, A.W., & Hemmer, P.A.Program evaluation models and related theories: AMEE guide no. 67. Med Teach, (2012). 34, e288–e299.
García-Hermoso, A., Alonso-Martínez, A.M., Ramírez-Vélez, R., Pérez-Sousa, MÁ, Ramírez-Campillo, R., & Izquierdo, M.Association of physical education with improvement of health-related physical fitness outcomes and fundamental motor skills among youths: a systematic review and meta-analysis. JAMA Pediatr, (2020). 174, e200223.
Gracia, E.P., Rodríguez, R.S., Pedrajas, A.P., & Carpio, A.J.Teachers’ professional identity: validation of an assessment instrument for preservice teachers. Heliyon, (2021). 7, e08049.
Heydari, M.R., Taghva, F., Amini, M., & Delavari, S.Using Kirkpatrick’s model to measure the effect of a new teaching and learning methods workshop for health care staff. BMC Res Notes, (2019). 12, 388.
Johnston, S., Coyer, F.M., & Nash, R.Kirkpatrick’s evaluation of simulation and debriefing in health care education: a systematic review. J Nurs Educ, (2018). 57, 393–398.
Johnston, S., & Fox, A.Kirkpatrick’s evaluation of teaching and learning approaches of workplace violence education programs for undergraduate nursing students: a systematic review. J Nurs Educ, (2020). 59, 439–447.
Koo, T.K., & Li, M.Y.A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med, (2016). 15, 155–163.
Morgan, B.S.A semantic differential measure of attitudes toward black American patients. Res Nurs Health, (1984). 7, 155–162.
Piryani, R.M., Dhungana, G.P., Piryani, S., & Sharma Neupane, M.Evaluation of teachers training workshop at Kirkpatrick level 1 using retro-pre questionnaire. Adv Med Educ Pract, (2018). 9, 453–457.
Quinton, M.L., Tidmarsh, G., Parry, B.J., & Cumming, J.A Kirkpatrick model process evaluation of reactions and learning from my strengths training for lifeTM. Int J Environ Res Public Health, (2022). 19, 11320.
Smidt, A., Balandin, S., Sigafoos, J., & Reed, V.A.The Kirkpatrick model: a useful tool for evaluating training outcomes. J Intellect Dev Disabil, (2009). 34, 266–274.
Tahmasebi, M., Adibi, P., Zare-Farashbandi, F., Papi, A., & Rahimi, A.The educational role of clinical informationist on improving clinical education among medical students: based on Kirkpatrick model. J Educ Health Promot, (2020). 9, 28.
Winstein, C., Lewthwaite, R., Blanton, S.R., Wolf, L.B., & Wishart, L.Infusing motor learning research into neurorehabilitation practice: a historical perspective with case exemplar from the accelerated skill acquisition program. J Neurol Phys Ther, (2014). 38, 190–200.
Wolf, E.J., Harrington, K.M., Clark, S.L., & Miller, M.W.Sample size requirements for structural equation models: an evaluation of power, bias, and solution propriety. Educ Psychol Meas, (2013). 76, 913–934.
Yu, J., & Jee, Y.S.Educational exercise program affects to physical fitness and gross motor function differently in the severity of autism spectrum disorder. J Exerc Rehabil, (2020). 16, 410–417.
Yu, J.E., Eun, D., & Jee, Y.S.Daily life patterns, psychophysical conditions, and immunity of adolescents in the COVID-19 era: a mixed research with qualitative interviews by a quasi-experimental retrospective study. Healthcare (Basel), (2022). 10, 1152.
Zhou, S., Davison, K., Qin, F., Lin, K.F., Chow, B.C., & Zhao, J.X.The roles of exercise professionals in the health care system: a comparison between Australia and China. J Exerc Sci Fit, (2019). 17, 81–90.

Article information Continued

Fig. 1

Bland–Altman graph analyses of Kirkpatrick’s four-level evaluation. Reaction (A), learning (B), behavior (C), and result (D), respectively.

Fig. 2

Correlation matrix of multiple variables across each row and column of Kirkpatrick’s evaluation scale. In A, a deeper shade of blue indicates a higher positive correlation, while a shift towards red hues suggests a negative correlation. Conversely, in panel B, a deeper shade of green signifies a higher positive correlation, whereas a shift towards yellow hues indicates a negative correlation. R, reaction; L, learning; B, behavior; Res, result.

Table 1

Kirkpatrick’s evaluation instrument for the exercise rehabilitation program

Category Code Question
Reaction R1 How interesting was the class?
R2 How helpful was this class in understanding understanding exercise rehabilitation principles?
R3 How helpful was this class in understanding rehabilitation exercise methods and principles?
R4 How do you rate the professor’s preparation for the class?
R5 How do you rate the university’s facilities and support for this class?
R6 How do you evaluate this class as an educational experience necessary for performing tasks related to your major?

Learning L1 Did your communication skills with others improve through the class?
L2 Did you receive feedback on your strengths and weaknesses, and did your professionalism improve through the class?
L3 Do you feel that you have grown personally through the class?
L4 Did the class help you manage aggressive behaviors better?
L5 Did the class help you manage passive behaviors better?
L6 Did the class improve your ability to set goals?
L7 Did you experience the value of becoming a prospective professional in your major through the class?

Behavior B1 Did your confidence in using rehabilitation-related knowledge or skills increase through the class?
B2 Does the class content accurately reflect rehabilitation exercise scenarios?
B3 To what extent can you apply the knowledge or skills acquired from the class in other places using other resources (e.g., equipment and information)?
B4 After the class, can you receive help through mentoring and feedback when applying rehabilitation knowledge or skills to exercise?
B5 How well do you think the learning outcomes of the class achieved the class objectives?

Results Res1 After completing the class, can you explain the basic characteristics of exercise rehabilitation?
Res2 After completing the class, can you explain the use of training equipment and exercise procedures?
Res3 After completing the class, can you explain the principles and methods of exercise rehabilitation?
Res4 After completing the class, can you explain exercise rehabilitation-related content in formal settings at school?
Res5 After completing the class, are you engaging in self-directed exercise, correcting exercise methods, and participating in self-directed rehabilitation or community-based programs

Table 2

Dimensions of Kirkpatrick’s scale

Dimension Item Scale (1–6)
Evaluation How would you rate the quality of the exercise rehabilitation program? Poor (1) - good (6)
Potential Do you feel stronger after completing the rehabilitation program? Weak (1) - strong (6)
Activity How active have you become since starting the rehabilitation program? Passive (1) - active (6)

Table 4

Descriptive statistics and ICC of reaction of total participants (n=141)

Item Test Retest ICC 95% CI
R1 1.45±0.60 1.42±0.60 0.980 0.973–0.986
R2 1.56±0.63 1.50±0.62 0.954 0.936–0.967
R3 1.45±0.61 1.43±0.60 0.965 0.952–0.975
R4 1.32±0.50 1.29±0.47 0.953 0.935–0.966
R5 2.09±1.11 2.08±1.11 0.990 0.986–0.993
R6 1.76±1.06 1.75±1.05 0.989 0.984–0.992

Values are presented as mean±standard deviation.

ICC, intraclass correlation coefficient; R, reaction; CI, confidence interval.

Table 3

Demographics of the participants

Item Groups Mann-Whitney U-test


Total (n=141) Males (n=72) Females (n=69) Z P-value η2
Age (yr) 21.40±1.62 21.75±1.89 21.04±1.18 −1.470 0.141 0.048

Grade 2.01±0.87 1.81±0.91 2.23±0.77 −3.741 <0.001 0.060

Sex 1.49±0.50 1.00±0.00 2.00±0.00 −11.832 <0.001 -

Height (cm) 171.86±5.88 175.78±4.39 167.77±4.21 −8.779 <0.001 0.467

Weight (kg) 65.31±15.58 75.11±15.26 55.09±7.01 −8.851 <0.001 0.416

REP (yr) 1.96±0.91 2.42±0.80 1.49±0.76 −6.457 <0.001 0.262

Values are presented as mean±standard deviation.

REP, rehabilitation exercise experience.

Table 5

Descriptive statistics and ICC of learning of total participants (n=141)

Item Test Retest ICC 95% CI
L1 2.07±1.03 2.04±1.03 0.946 0.925–0.961
L2 1.87±0.97 1.83±0.97 0.897 0.856–0.926
L3 1.79±0.73 1.75±0.73 0.970 0.958–0.978
L4 1.81±1.03 1.77±1.04 0.883 0.837–0.916
L5 1.97±1.10 1.94±1.10 0.906 0.870–0.933
L6 1.77±0.76 1.76±0.75 0.978 0.969–0.984
L7 2.04±1.24 2.01±1.21 0.938 0.914–0.956

Values are presented as mean±standard deviation.

ICC, intraclass correlation coefficient; L, learning; CI, confidence interval.

Table 6

Descriptive statistics and ICC of behavior of total participants (n=141)

Items Test Retest ICC 95% CI
B1 1.72±0.76 1.70±0.75 0.974 0.964–0.982
B2 1.70±0.84 1.69±0.83 0.949 0.928–0.963
B3 1.74±0.72 1.73±0.72 0.873 0.823–0.909
B4 1.70±0.78 1.67±0.78 0.859 0.803–0.899
B5 1.74±0.73 1.72±0.73 0.870 0.819–0.907

Values are presented as mean±standard deviation.

ICC, intraclass correlation coefficient; B, behavior; CI, confidence interval.

Table 7

Descriptive statistics and ICC of results of total participants (n=141)

Item Test Retest ICC 95% CI
Res1 1.87±0.81 1.87±0.80 0.926 0.897–0.947
Res2 1.97±0.79 1.96±0.79 0.923 0.893–0.945
Res3 1.79±0.78 1.77±0.77 0.910 0.875–0.936
Res4 2.22±1.04 2.21±1.04 0.925 0.895–0.946
Res5 2.26±1.36 2.21±1.30 0.834 0.769–0.881

Values are presented as mean±standard deviation.

ICC, intraclass correlation coefficient; Res, result; CI, confidence interval.

Table 8

Correlation coefficients among the Kirkpatrick’s four evaluation factors

R1 R2 R3 R4 R5 R6 L1 L2 L3 L4 L5 L6 L7 B1 B2 B3 B4 B5 Res1 Res2 Res3 Res4 Res5
R1 1 0.449*** 0.383*** 0.260** 0.264** 0.315*** 0.480*** 0.460*** 0.442*** 0.459*** 0.398*** 0.540*** 0.322*** 0.388*** 0.278** 0.379*** 0.442*** 0.423*** 0.440*** 0.386*** 0.340*** 0.425*** 0.328***
R2 1 0.571*** 0.363*** 0.189** 0.313*** 0.416*** 0.375*** 0.712*** 0.476*** 0.430*** 0.532*** 0.250** 0.608*** 0.495*** 0.525*** 0.439*** 0.503*** 0.425*** 0.364*** 0.392*** 0.205* 0.196*
R3 1 0.605*** 0.259** 0.484*** 0.437*** 0.367*** 0.671*** 0.417*** 0.475*** 0.545*** 0.334*** 0.657*** 0.369*** 0.582*** 0.508*** 0.526*** 0.633*** 0.554*** 0.630*** 0.529*** 0.364***
R4 1 0.314*** 0.417*** 0.389*** 0.358*** 0.500*** 0.356*** 0.358*** 0.481*** 0.270*** 0.525*** 0.296*** 0.468*** 0.435*** 0.501*** 0.492*** 0.476*** 0.470*** 0.459*** 0.119
R5 1 0.419*** 0.436*** 0.413*** 0.269*** 0.395*** 0.403*** 0.406*** 0.444*** 0.293*** 0.173* 0.296*** 0.253** 0.310*** 0.308*** 0.296*** 0.426*** 0.414*** 0.343***
R6 1 0.396*** 0.407*** 0.492*** 0.354*** 0.301*** 0.468*** 0.758*** 0.420*** 0.246** 0.330*** 0.332*** 0.325*** 0.296*** 0.289*** 0.334*** 0.295*** 0.630***
L1 1 0.839*** 0.646*** 0.800*** 0.809*** 0.716*** 0.530*** 0.540*** 0.603*** 0.488*** 0.658*** 0.691*** 0.579*** 0.512*** 0.518*** 0.617*** 0.372***
L2 1 0.614*** 0.797*** 0.726*** 0.676*** 0.575*** 0.601*** 0.653*** 0.494*** 0.559*** 0.568*** 0.536*** 0.518*** 0.520*** 0.601*** 0.415***
L3 1 0.585*** 0.543*** 0.701*** 0.383*** 0.775*** 0.566*** 0.638*** 0.581*** 0.643*** 0.542*** 0.492*** 0.505*** 0.437*** 0.336***
L4 1 0.834*** 0.559*** 0.505*** 0.594*** 0.696*** 0.451*** 0.492*** 0.539*** 0.559*** 0.446*** 0.514*** 0.520*** 0.270***
L5 1 0.608*** 0.441*** 0.635*** 0.594*** 0.434*** 0.539*** 0.678*** 0.528*** 0.444*** 0.560*** 0.610*** 0.361***
L6 1 0.453*** 0.712*** 0.591*** 0.657*** 0.705*** 0.661*** 0.658*** 0.628*** 0.683*** 0.591*** 0.461***
L7 1 0.442*** 0.344*** 0.273*** 0.334*** 0.308*** 0.217** 0.284*** 0.338*** 0.349*** 0.604***
B1 1 0.626*** 0.637*** 0.574*** 0.602*** 0.603*** 0.616*** 0.668*** 0.534*** 0.434***
B2 1 0.544*** 0.500*** 0.548*** 0.541*** 0.447*** 0.564*** 0.436*** 0.319***
B3 1 0.595*** 0.580*** 0.764*** 0.775*** 0.714*** 0.611*** 0.369***
B4 1 0.774*** 0.592*** 0.631*** 0.558*** 0.603*** 0.399***
B5 1 0.572*** 0.542*** 0.592*** 0.688*** 0.436***
Res1 1 0.796*** 0.747*** 0.673*** 0.415***
Res2 1 0.786*** 0.713*** 0.406***
Res3 1 0.702*** 0.478***
Res4 1 0.498***
Res5 1

R, reaction; L, learning; B, behavior; Res, result.

*

P<0.05.

**

P<0.01.

***

P<0.001.