1. Study specimens
Prostate tissue specimens were obtained from archived, paraffin-embedded blocks of radical prostatectomy specimens. The expression of AR in paired specimens of benign prostate and prostate cancer from 20 African and 20 Caucasian Americans was compared to demonstrate an application of this system. Clinical data from these specimens was obtained from prospectively maintained clinical databases.
2. Immunohistochemistry
Immunohistochemistry allows for in situ protein localization and computer assisted image analysis of AR while preserving tissue architecture. Polyclonal or monoclonal antibodies target specific epitopes located within cellular structures to visualize epitopes of interest. Archival paraffin-embedded prostate specimens were cut into 6 μ m sections and placed on ProbeOn Plus™ microscope slides (Fisher Scientific, Pittsburg, PA). After deparaffination and rehydration through graded alcohols (100%, 95%, 70%), tissue sections were subjected to antigen retrieval in Reveal Citra buffer (Biocare Medical, Walnut Creek, CA) using a pressurized antigen-decloaking chamber for 2 minutes at 120°C and 21 PSI. The sections were cooled to room temperature and blocked for non-specific staining with 2% normal horse serum for 15 minutes at 37°C. Endogenous peroxidase was blocked using 3% hydrogen peroxide diluted in methanol and endogenous biotin was blocked using an Avidin Biotin kit (Vector, Burlingham, CA). Sections were incubated using a capillary gap method with monoclonal antihuman AR antibody F39.4.1 (Biogenex, San Ramon, CA) diluted in Primary Antibody Diluting Buffer (Biomeda Corp., Foster City, CA) at 1:500 for 1 hour at 37°C in a humidified heating block. Sections were incubated with biotinylated anti-mouse IgG (Vector) 1:200 for 30 minutes at 37°C. The signal was then amplified using avidin-biotin complex (ABC) Vector and visualized using 3,3'-diaminobenzidine (DAB) (Vector). Counterstaining was performed using hematoxylin (Fisher Scientific) for 15 seconds (diluted 1:3 in H2O). Slides were dehydrated through graded alcohol and mounted using Permount (Fisher Scientific). Benign and malignant tissues were immunostained in a single batch.
3. Image acquisition
The images were acquired using a 40:1 objective, N.A. 0.85 for a total magnification of 400×. Contrast and brightness were adjusted by manipulating the gain and exposure time of the camera. Illumination was adjusted to generate maximum contrast while avoiding over- and under-saturation of gray levels. A series of neutral filters were added to confirm the linearity of output in final optical settings. Temporal variation of light output was measured frequently and found insignificant (<0.2%). Images were sampled randomly throughout histological sections, but areas of necrosis, artifacts and edges were avoided. Each image was captured under the same reproducible conditions. White and Black balance of the camera was performed to ensure the optimal use of the dynamic range of the camera. Ten images were collected from each tissue specimen. Each image consisted of 640 × 480 pixels collected in 24-bit color mode (16.7 million colors) and was stored in an uncompressed tagged image format file (TIFF).
4. Creation of classification parameters
Commercial reagents used in the immunohistological staining process are not standardized; thus immunostaining patterns differ between various research labs. When combined with local variations in image acquisition, the resulting automated analysis may produce significant errors. Classification parameters are used to calibrate the nuclear analysis software with each new dataset, thus making the automated image analysis software independent of the type of immunostaining or imaging system used. Figure 2 shows the block diagram of the steps involved in the creation of classification parameters.
4.1 RGB dataset
A minimum of 200 immunopositive and 200 immunonegative nuclei were randomly sampled from the acquired images. Red, Green and Blue information was extracted from the selected nuclei. A new column 'Class' was added to the dataset. Objects identified as immunopositive are class 1 and objects identified as immunonegative are class 2.
4.2 Classification parameters
The dataset was divided into two non-overlapping sets; (a) training set and (b) test set. Each consists of at least 100 immunopositive nuclei and 100 immunonegative nuclei. Classification coefficients were computed from the training set using either of the following methods.
(a) Linear Discriminant Analysis
Let x = {xhue, xsaturation, xintensity} denote individual data structure present and x1 = {x11, x12, x13..., x1n1}, x2 = {x21, x22, x23..., x2n2} represent class 1 and class 2 datasets from the training set with μ1, μ2 as their corresponding means [29].
The covariance of x1 and x2 is:
The pooled covariance is given by:
Sp = (S1 + S2) / (n1 + n2 - 2) (3)
where n1, n2 are the number of immunopositive and immunonegative nuclei in the training set, respectively.
The classification coefficients λ = {λ1, λ2, λ3} and constant C are computed as
C = μ λ (5)
where S-1 is the inverse of S and μ is the mean of μ1 and μ2.
The classification function is of the form:
G(x) = λ1 xhue + λ2 xsaturation + λ3 xintensity - Constant (6)
where G(x) is the classification score.
(b) Logistic Regression
Let x = {xhue, xsaturation, xintensity} denote individual data structure present, X = {x1, x2, x3..., xn}, the training set and Y represent a column vector with class information of the test dataset. The probability of Y = 1 in a multiple logistic regression model [30] is given as
p = 1 / (1 + e-β X) (7)
where β is the coefficients vector. The equation can be rewritten as
ln(p / (1-p)) = β X (8)
Equation (8) represents the log of odds as a linear function of X. Since the values for log of odds is not available, a maximum likelihood function provides the solution.
Each dataset can be considered as a Bernoulli trial. That is, it is a binomial with the total number of trials equal to 1. Consequently for the ith observation
Assuming all datasets are independent, the likelihood function is given by
The log of the likelihood function is given by
The parameter vector β are obtained by maximizing (11) using the efficient Newton-Raphson iterative technique. The classification function is of the form:
G(x) = β0 + β1 xhue + β2 xsaturation + β3 xintensity (12)
where G(x) is the classification score.
The classification function was tested on the test set. If z = {zhue, zsaturation, zintensity} is an individual data structure in the test set, it is classified as class 1 if G(z) > 0, otherwise, it is class 2. The classification scores were compared with actual scores and a classification table was constructed. If the percentage of class 1 nuclei and class 2 nuclei identified correctly is greater than 85, then the classification coefficients were used. If not, nuclei are randomly sampled again and the process was repeated.
Limits for nuclear area are added to the parameter set to eliminate possible artifacts in the image. The lower limit and upper limits of nuclear area were calculated from the dataset
Area upper = Area Mean + 2SD (13)
Area lower = Area Mean - 2SD (14)
where SD is the standard deviation of the nuclear area measures.
5. Image analysis
A block diagram of the image analysis program is shown in Figure 3. Red, green and blue color information was extracted from the original uncompressed 24-bit color image and stored as 8-bit grayscale images. Discriminant analysis of grayscale histograms was used to determine optimal thresholds for automated segmentation of red, green and blue images [31]. The adaptive threshold was applied using an 80 × 80 pixel window. This window size was chosen because it is about four times the size of a typical nucleus (nuclear diameter ~20 pixels). The three segmented images were combined by a logical OR operation. The combined image was eroded and dilated twice using a 3 step erosion filter (3 × 3 cross, 1 × 3 horizontal and 3 × 1 vertical kernels). Erosion was used to shrink the detected nuclear boundaries and dilation was used to fill the nuclear areas. Artifacts were removed based on size and shape. The nuclear regions were then labeled in raster fashion to create a nuclear mask. Regions not labeled are regarded as background.
Use of red, green and blue images to separate immunopositive from immunonegative nuclei is problematic because the color of stain is mixed with the intensity of stain. An HSI color model was used because it decouples intensity information from color information [32]. The hue, saturation and intensity component images were multiplied by their corresponding discriminant coefficients from the parameter file and combined to form a single image. A nuclear mask was applied to the image and the resulting nuclear areas were classified as immunopositive or immunonegative depending on their classification score. Figure 4 shows part of an image at different stages of image processing. The precise number of nuclei measured may be inaccurate due to the presence of nuclear overlap or clusters of nuclei. Addition of an upper limit for nuclear area measurement creates a reproducible error. Nuclear shape limits were also used to separate epithelial nuclei from artifacts, endothelial and stromal nuclei and inflammatory cells.
The nuclear mask was applied on the intensity image to obtain the intensity mask image. MOD of each nuclear area present in the image is calculated as:
where N is the total number of pixels in a nuclear mask, Ii is the intensity level of the pixel i, and Io is the intensity level of the background measured in each field of view. NRF is the ratio of the radius of the circle the perimeter of which is equivalent to the measured perimeter to the radius of the circle of which is equivalent to the measured area. The NRF of each nuclear object is given by:
where A is the measured nuclear area and perimeter P is calculated using a chaining approximation, using weights (1, 4, 6, 4, 1). More information on MOD, area and perimeter calculations can be found in earlier publications [13, 23].