Sample of lumbar MRI
A sample of 30 patients (11 female and 19 male) was randomly selected from a cohort of patients participating in Genodisc, a European research consortium project on commonly diagnosed lumbar pathologies in patients attending spine surgeon clinics. All patients included in this study received a diagnosis of disc herniation, spinal stenosis, spondylolisthesis, or nonspecific LBP. Patients were excluded if they were below 18 or over 60 years of age, had a contract agent allergy, had reduced renal function, were not able to undergo MRI acquisition, or had a tumor, infection, spinal fracture, rheumatoid arthritis or were pregnant. All participants completed a consent form acknowledging that their data will be used for research purposes.
The MRI protocol included a routine T2-weighted turbo spin echo sequence for both axial and sagittal images acquired with a Siemens Avanto 1.5T MRI system (Siemens AG, Erlangen, Germany) (axial T2 parameters included repetition time = 4000, echo time = 113 and slice thickness = 3 mm).
Automated thresholding algorithm
Initially, a series of T2-weighted MR images from two patients were used to train the algorithm. Muscle measurements were then automatically calculated by the algorithm, which involves a series of steps, once the muscle of interest has been manually segmented. First, a preprocessing technique was applied to each MR image to enhance the quality and the contrast of the images. This preprocessing step includes an adaptive histogram equalization method and image adjustment scheme. The adaptive histogram equalization algorithm was employed to balance the grayscale level at each point of the image. We have used contrast limited adaptive histogram equalization (CLAHE) algorithm [19]. In this algorithm the histogram equalization is applied on small rectangles of the image instead of the whole image. It changes the histogram of each rectangle to a uniform distribution. A bilinear interpolation method was also applied to avoid the formation of artificially stimulated boundaries. Then, the image adjustment scheme was utilized to improve the contrast of the image. This modifies the contrast of the image so that only a small fraction (1%) of the image is saturated as low (dark) and high (bright) intensities [20, 21], providing a high contrast MR image (Fig. 1). These preprocessing steps were applied to reduce the inhomogeneity artifacts. Since our method increase the image contrast locally, the thresholding step was minimally affected by this noise.
In order to calculate the area of fat and muscle tissue, a threshold level was selected using the Otsus’s scheme [22, 23]. This threshold is calculated to minimize the interclass difference between black and white points, and normalized the pixel intensity values between 0 and 1. The chosen threshold value is then applied to the selected ROI, and the algorithm computes automatically the number of white and black pixels in the area, which will represent the area of fat and muscle tissue. As the MRI images used for this study were of high quality, the Otsu thresholding technique was adequate for our experiments. While the preprocessing steps to enhance the contrast of the image (as described above) provided a high contrast image and Otsu thresholding method segments the image with accuracy compatible with the manual segmentation. The algorithm was implemented in MATLAB (Mathworks, Natick, MA, USA).
Muscle measurements
All muscle measurements were acquired by one of the investigators (MF), who has more than 6 years of experience in quantitative MRI muscle assessment. Quantitative measurements of the multifidus and erector spinae muscles were obtained from axial T2-weighted images at mid-disk for L4–L5 and L5–S1 for every subject. This image sequence was selected as it is routinely obtained in lumbosacral MRI examination and has been widely used to assess paraspinal muscle composition. The two levels were selected because most lumbar pathologies and muscle morphological changes occur at L4–L5 and L5–S1. The paraspinal muscle measurements of interest for this study included: the total cross-sectional area (CSA), the functional cross-sectional area (FCSA), representing the area of pure muscle mass (excluding fatty infiltration) and the area occupied by fat, and the fat percentage.
Muscle measurements were first obtained using a manual thresholding technique using ImageJ image analysis software (version 1.43, National Institutes of Health, Bethesda, Maryland). FCSA was measured by manually selecting a threshold signal within the total muscle CSA to include only pixels within lean muscle tissue range. The grayscale range for lean muscle mass was established for each subject and scan slice. This thresholding technique has been shown to be highly reliable and is described in detail elsewhere [11]. Once the first set of measurements with ImageJ was completed, the rater was blinded to the results and the same MRI slices were then assessed using the automated algorithm and MATLAB software (version R2015b), a minimum of 5 days after the first measurements were completed. For this method, the rater manually segmented the CSA of the muscle of interest on each slice, and the thresholding algorithm automatically calculated the muscle CSA, the fat CSA and the muscle fat percentage. All muscle measurements were obtained four times by the same rater, twice using the manual thresholding method and twice using the automated thresholding algorithm.
Statistical analysis
Descriptive statistics, such as means and standard deviations, were calculated for each muscle measurement of interest. The ICC(2,1) was calculated to determine the intra-rater reliability of measurement using the manual thresholding technique and automated algorithm, as well as the inter-method reliability using a two-way random-effects model and absolute agreement. The ICCs were interpreted using the following criteria, as suggested by Portney and Watkins: 0.00–0.49 = poor, 0.50–74 = moderate, and 0.75–1.0 = excellent [24]. Method agreement between the measurements acquired using the manual thresholding technique and the automated algorithm was also evaluated using the 95% limits of agreement, as suggested by Bland and Altman [25, 26]. The standard error of measurement (SEM) was calculated to provide an estimate of the expected error related to a particular measurement in the same units as the initial measurement (SEM = S√(1 − rxx), where S = standard deviation of the test, and rxx = reliability of the test). Results were analyzed according to the spinal level and muscle investigated. The statistical analysis was performed using Statistical Package for the Social Sciences version 23.0 (SPSS Inc, Chicago, Illinois).