Comparison of synthetic aperture architectures for miniaturised ultrasound imaging front-ends

Background Point of care ultrasonography has been the focus of extensive research over the past few decades. Miniaturised, wireless systems have been envisaged for new application areas, such as capsule endoscopy, implantable ultrasound and wearable ultrasound. The hardware constraints of such small-scale systems are severe, and tradeoffs between power consumption, size, data bandwidth and cost must be carefully balanced. Methods In this work, two receiver architectures are proposed and compared to address these challenges. Both architectures uniquely combine low-rate sampling with synthetic aperture beamforming to reduce the data bandwidth and system complexity. The first architecture involves the use of quadrature sampling to minimise the signal bandwidth and computational load. Synthetic aperture beamforming (SAB) is carried out using a single-channel, pipelined protocol suitable for implementation on an FPGA/ASIC. The second architecture employs compressive sensing within the finite rate of innovation framework to further reduce the bandwidth. Low-rate signals are transmitted to a computational back-end (computer), which sequentially reconstructs each signal and carries out beamforming. Results Both architectures were tested using a custom hardware front-end and synthetic aperture database to yield B-mode images. The normalised root-mean-squared-error between the quadrature SAB image and the RF reference image was \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$13\%$$\end{document}13% while the compressive SAB error was \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$22\%$$\end{document}22% for the same degree of spatial compounding. The sampling rate is reduced by a factor of 2 (quadrature SAB) and 4.7 (compressive SAB), compared to the RF sampling rate. The quadrature method is implemented on FPGA, with a total power consumption of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$4.1 $$\end{document}4.1 mW, which is comparable to state-of-the-art hardware topologies, but with significantly reduced circuit area. Conclusions Through a novel combination of SAB and low-rate sampling techniques, the proposed architectures achieve a significant reduction in data transmission rate, system complexity and digital/analogue circuit area. This allows for aggressive miniaturisation of the imaging front-end in portable imaging applications.


Background
Recent years have seen significant advances in the development of highly portable ultrasound imaging systems providing real-time diagnostic information. These developments have largely been fueled through parallel advances in integrated electronics, micromachined transducers and signal processing methods. Field-programmable gate arrays (FPGAs) and application-specific integrated circuits (ASICs) provide adaptive parallelism, high performance per milliwatt, and a smaller form factor. Much work has also focused on integrating electronic front-ends with novel capacitive/piezoelectric micromachined ultrasound transducers (CMUTs/PMUTs) for tethered applications such as 3D intravascular or endoscopic imaging [1,2]. This has led to an ever increasing drive to further miniaturise point-of-care ultrasound devices. However, many challenges still inhibit the development of further miniaturised, wireless systems for novel applications such as ultrasonic capsule endoscopy [3] and wearable imaging [4]. This is primarily due to the severe power and circuit area constraints affecting both analogue and digital components of the system. Cost has also been a major limitation in low-resource clinical settings, and while some more affordable commercial devices like the GE VScan have been devised, there remains scope for further reductions in system complexity and cost. The critical challenge is maintaining sufficient image quality for diagnostic purposes while reducing size, system complexity and cost. These considerations form the basis of this investigation, where two ultrasound receiver architectures are proposed in an attempt to optimise these tradeoffs.
A great diversity of efficient hardware-level beamforming strategies in the analogue and digital domains have been proposed in literature. In analogue beamforming, signals from each transducer element are delayed with analogue delay lines, and then summed and digitised [5][6][7]. However, the number of analogue delay cells increases quadratically with the number of channels, resulting in excessive complexity, power consumption and pulse distortion for a practical array. Digital RF beamforming systems have the advantage of better delay accuracy, provided that the clock frequency is high enough. If conventional phased array imaging is used with many parallel receive channels, the price is increased analogue-to-digital converter (ADC) power consumption and data bandwidth. Kim et al. attempt to alleviate this problem by multiplexing many channels through a single ADC [8]. They propose a CMOS ultrasound transceiver chip that is closely coupled to a 50 MHz PMUT transducer with 16 elements. However, a high clock frequency ( 250 MHz) is required, and the overall power consumption ( 270 mW) exceeds the constraints of a highly portable, wireless application. Furthermore, receive beamforming is not carried out on-chip. In [9], a state-of-the-art, 32-channel, point-of-care system-ona-chip (SOC) is demonstrated, featuring full transmit and dynamic receive beamformer modules and colour Doppler processors. However, the SOC consumes 1.2 W and occupies 27 × 27 mm 2 (for 32 − 64 channels).
In-phase/quadrature (I/Q) beamforming techniques have also been proposed to reduce hardware complexity. The direct sampled I/Q beamforming method in [10] employs second-order sampling to obtain I/Q components directly from RF signals. Digital focusing is then implemented via phase rotation of the I/Q data. This considerably reduces the hardware requirements, making the beamformer small and inexpensive. In [11] a similar phase-error-free quadrature sampling technique is used, where I/Q components are obtained by mixing with a reference signal. I/Q data are then sampled and focused using synthetic aperture beamforming (SAB). While SAB is particularly useful in small scale systems in which hardware simplicity is mandatory [11,12], it does lead to lower SNR and unwanted grating lobes. However, numerous techniques have been proposed to assist with minimising these grating lobes and increasing the SNR [12].
Compressive sensing (CS) is another viable method of improving power efficiency by decreasing the sampling rate. In [13], Vetterli et al. proposed a sampling paradigm for certain classes of parametric signals with a finite rate of innovation (FRI). This work is extended by Eldar et al. [14] into a unified framework termed Xampling and applied it to ultrasound imaging in software [15]. The result is an eightfold reduction in sampling frequency from the original RF rate, with comparable image quality.
In this work, the synthetic aperture beamforming (SAB) is applied as a means of reducing system complexity [11,12]. A novel SAB method is employed which combines aspects of the traditional synthetic aperture focusing technique (SAFT), and synthetic receive aperture (SRA) beamforming, where the receive aperture is split into multiple sub-apertures that are multiplexed in time. In this system, transmission is carried out n times for all receive elements, and reflected signals are multiplexed through a single receive channel, which significantly reduces system complexity and size. Spatial compounding across multiple transmit positions increases the SNR. Although only a single channel is used, the entire system is scalable to any number of channels, depending on what frame rate is required.
Furthermore, SAB is uniquely combined with two different sampling techniques: quadrature/baseband sampling and compressive sensing within the FRI framework. This approach enables unprecedented miniaturization through a reduction system complexity and data bandwidth. Both architectures are tested in hardware and compared in terms of their utility in highly portable applications. The architectures do not challenge existing solutions such as [9], which focus on portable applications with high image quality ( ∼ 128 channels) and high frame rates ( ∼ 30 Hz). Instead, frame rate and image quality are carefully traded against hardware complexity, cost and power consumption in order to enable further miniaturisation for small-scale applications such as capsule endoscopy (which typically requires a frame rate of 2-4 Hz). Importantly, the parameters of both architectures may be tuned to balance these tradeoffs for the given application.

Architecture 1: compressive sensing with the FRI framework
The first architecture employs a sampling paradigm originally proposed by Vetterli et al. [13] for certain classes of parametric signals. Parametric signals with k parameters may be sampled and reconstructed using only 2k parameters. These signals have a finite rate of innovation (FRI) and appear in many applications such as biomedical imaging and radar. The sampling scheme in [13] was applied to periodic and finite streams of FRI signals such as Diracs impulses, nonuniform splines, and piecewise polynomials. An appropriate sample kernel (sinc, Guassian, sum of sincs, etc. [15]) is applied to extract a set of Fourier coefficients which are then used to obtain an annihilating filter. The locations and amplitudes of the pulses are finally determined. A brief review of the method in [13] is provided below.
Consider an FRI signal x(t) (e.g., ultrasound A-mode signal) comprising a finite stream of pulses with pulse shape p(t), amplitudes {c k } K −1 k=0 and time locations {t k } K −1 k=0 : The sample values are obtained by filtering the signal with a sampling kernel. A sinc kernel is defined as h B (t) = Bsinc(Bt) , with bandwidth B = 1/T . The convolution product is: This is equivalent to: Since the signal has K degrees of freedom, we require N ≥ 2K samples to sufficiently recover the signal. The reconstruction method requires two systems of linear equations-one for the locations of the Gaussian pulses involving a matrix V, and one for the weights of the pulses involving a matrix A. Define a Lagrange polynomial L k (u) = (P(u)/(u − t k /T )) of degree K − 1 , where P(u) = K −1 k=0 (u − t k /T ). Multiplying both sides of (5) by P(n) yields an expression in terms of the interpolating polynomials: To find the K locations t k (i.e. the time delays of the pulses), we begin by deriving an annihilating equation to find the roots of P(u) . Now, since the right hand side of (6) is a polynomial of degree K − 1 in the variable n, if we apply K finite differences, the left hand side will become zero, i.e., △ K (−1) n P(n)y n = 0, n = K , . . . , N − 1 . Letting P(u) = k p k u k leads to an annihilating filter equation equal to: where V is an (N − K ) × (K + 1) matrix. The system admits a solution when Rank(V) ≤ K and N ≥ 2K . Thus, (8) may be used to find the K + 1 unknowns p k , which leads to K locations t k as these are the roots of P(u) . Once the locations have been determined, the weights of the Gaussian pulses c k may be found by solving the system in (7) for n = 0, . . . , K − 1 . The system has no solution if Rank(A) = K , where A ∈ IR K×K is defined by (6). A more detailed discussion of the annihilating filter method is provided in [13]. Theoretically, the result does not depend on the sampling period T. However, V may be poorly conditioned if T is not chosen appropriately. As simulation results show below, oversampling yields an increase in the SNR of the reconstructed result.
It is also important to note that the sinc kernel described above has infinite time support and is non-causal. In the frequency domain, it is represented by an ideal lowpass filter with an infinite rolloff. Practically, the sinc kernel may be approximated in hardware by means of an high order analogue lowpass filter. Simulations below demonstrate the performance of multiple filter types and orders, and a comparison is made to other kernel types suggested in [15].

System overview
The proposed compressive synthetic aperture beamforming (SAB) architecture is illustrated in Fig. 1a. The analogue front-end (AFE) amplifies and demodulates the RF signal into I and Q components. This is achieved by mixing the RF waveform with reference signals centered at the carrier frequency. The assumption is made that both the I and Q signals satisfy the FRI criterion, i.e., they both have finite rates of innovation. These signals are thus filtered and bandlimited below the original I/Q bandwidth. This (8) K k=0 p k △ K (−1) n n k y n is carried out in the analogue domain in order to reduce the sampling frequency and thus the data bandwidth. This leads to a significant power saving, as the power budget is predominated by the power consumption of the ADC and wireless transceiver. By compressing the signal in the analogue domain, the computational burden is shifted to the digital back end (computer), which carries out reconstruction of the I/Q signals and finally baseband beamforming. Power is not a critical constraint here as it is in the second architecture (quadrature SAB), where beamforming is carried out on FPGA.

Architecture 2: quadrature synthetic aperture beamforming
The second architecture combines SAB with quadrature sampling-i.e. signals are processed in the baseband. The image is formed in the hardware front-end prior and then transmitted to the display device. The proposed quadratic SAB architecture is illustrated in Fig. 1b. The transducer produces a bandpass signal R(t) which may be expressed as: where A(t) represents the envelope, ω c the carrier frequency in radians per second, and φ the phase [16]. Expansion of R(t) yields: where A I (t) = A(t) cos φ(t) and A Q (t) = A(t) sin φ(t) are the in-phase and quadrature components respectively. These may be obtained by mixing with a reference signal in the analogue domain and filtering the result. Since A I (t) and A Q (t) are baseband signals, they may be sampled at a lower rate. This reduces the computational burden on the beamforming processor (FPGA/ASIC). After sampling, the next step is to appropriately phase-rotate the I/Q data for focusing. According to the synthetic aperture focusing method, for a given pixel location − → r p at depth index k, the required time instance t p (i, j) to take the signal value for summation is calculated by dividing this distance by the speed of sound in the medium [17].
where r p denotes the position of the imaging point, r e (i) the location of the i th transmitting element and r r (j) the location of the jth receiving element. A corresponding discretised delay index I p (i, j) may then be calculated. An interpolation factor of K is applied to increase the delay resolution. That is, if N S sample points are obtained, then there are many as K × N S index locations between 1 and I p (i, j) max . The index value is read from a lookup table that is calculated a priori, based on the locations of each pixel ( − → r p ) and transmitting (i) or receiving (j) element. For each index location I p (i, j) , the I or Q data are then interpolated on-the-fly using any standard technique such as linear or quadratic interpolation. Now, if the delay is applied directly to the I/Q data, critical frequency-dependent phase errors distort the final image [11]. Therefore, I/Q sample points are remodulated or upconverted back to RF by mixing the interpolated result with new discrete reference signals: where ω c is the carrier frequency in rad/s and n is the discretised time index. Again, are calculated a priori. The interpolated I and Q values are multiplied by the reference signals at n = I p and then subtracted to yield the RF amplitude: This value is then added to the pixel location, and the process is repeated for all i, j and n values, resulting in a low-resolution image. These low-resolution images are summed or averaged to obtain a higher resolution image, which may then be transmitted via a wireless transmission link to an external post-processor. The final focused signal y f ( − → r p ) expressed mathematically is: where a I p (i, j) is the apodisation (weighting) function, R I p (i, j) is the phase-shifted I/Q sum evaluated at I p (i, j) , N is the number of transducer elements and M the number of transmissions.

Design tradeoffs and optimisation
The parameters of the algorithm must be carefully selected to optimise the multidimensional tradeoffs between circuit area, power consumption, frame rate, image quality/size, transmission line bandwidth and system complexity/cost. In particular, the image quality is dependent on the number of transmissions ( i max )/ size of the synthetic transmit aperture, and the number of receivers ( j max )/size of the receive aperture. A larger value of i max implies better spatial compounding, SNR and lateral resolution. Similarly, the lateral resolution is a function of the size of the receive aperture, so increasing j max improves the image quality. However, increasing i max / j max leads to an increase in data acquisition time and therefore a reduction in the maximum frame rate ( FR max ), which is a function of the time of flight t f = 2D/c: where D is the depth and c is the speed of sound in the medium. The maximum frame rate is linearly proportional to the number of parallel receiver channels N a . FR max effectively defines the boundary of a region of operation for various values of N a . For (13) Fig. 2, three regions of operation may be defined for N a = 1 , N a = 2 and N a = 8 . In this study, only a single channel is used as a proof of concept. Thus, with N a = 1 , D = 10 cm, c = 1540 m/s, i max = 30 and j max = 64 , the maximum frame rate is 4 Hz, which is acceptable for capsule endoscopy but not for a portable scanner. In this case, either the image size/quality may be reduced, or more channels must be used at the expense of increased power consumption.
The proposed algorithm inherently lends itself to an iterative, pipelined approach that may easily be implemented in a hardware description language (HDL) for implementation in hardware. Within the regions of operation discussed above, the frame rate is thus a function of other digital design parameters such as clock frequency, f clk , and the degree of parallelisation (i.e. the number of parallel delay calculations per clock edge, N p ). In order to increase the frame rate up the maximum in (19), f clk and/ or N p must be increased at the expense of power and/or area (or logic utilisation). For N a = 1 , this relationship is expressed in the following equation: where z max is the number of pixels in the axial dimension of the image. Equation (20) may be derived using (18) by a process of multiplication. The beamforming process is carried out for all pixels ( j max × z max ) for all transmit/receive operations ( i max × j max ). A factor of two in the denominator is introduced to account for serialising send and receive operations in hardware over two clock cycles. The relationship in (20) is illustrated in Fig. 2, where frame rate is plotted against the number of transmit position, i max , for various clock frequencies and j max = 64 channels, z max = 350 , N a = 1 and N p = 8 . Figure 2 also demonstrates the relationship between the clock frequency and i max for a constant frame rate of 5 Hz. The clock frequency may be increased at the expense of power up to the maximum operating frequency of the digital circuit.

Dynamic apodisation
Dynamic apodisation is used to maintain a constant F-number ( f # ) over the imaging depth. The F-number is defined as the ratio of the imaging depth, z, to the aperture size, α [18]. The synthetic aperture is dynamically grown as a function of the imaging depth in order to keep the f # constant. The number of lines l to consider in a window for focusing to a depth z are calculated using the following expression [18]: where z k is the pixel depth and Δx is the inter-element spacing. This equation is used to derive a set of a priori constants that are stored in memory to allow for real-time dynamic apodisation.

Experimental setup
In order to carry out measurements, a dedicated 2-layer printed circuit board (PCB) was designed, as shown in Fig. 3. The PCB interfaces with a full custom analogue front-end (AFE), which was designed and fabricated in AMS 0.35 μm CMOS technology. The AFE functions as an analogue demodulator, producing I/Q components from an RF input signal. The chip includes a sixth order lowpass filter with selectable bandwidth to allow for testing over a range of values. The PCB hosts a Cesys EFM-02 embedded FPGA module based on the Xilinx Spartan-6LX ® FPGA (XC6SLX150-3FGG484I). The FPGA is used to generate control signals for the IC, as well as digital mixing signals. The FPGA also communicates with a ADC10D020 dual 10-bit ADC, which samples I and Q channels separately at 2.5 MHz. The quadrature SAB method and an internal UART module were implemented on FPGA. This module connects to an external FT232 USB to serial UART interface controlling communication with the PC. Image post-processing is carried out in MATLAB, which also handles PC-side serial communications.
A custom MATLAB program was written to control an external PicoScope ® 5442B oscilloscope/arbitrary waveform generator (AWG). The script sequentially updates the AWG with RF signals obtained from a synthetic aperture database stored on the PC. This data were previously captured on a Verasonics Vantage 256 ™ system using a P4-1 phased array (central frequency at 2.5 MHz) with 96 active elements. The original RF sampling rate is 10 MHz, and the imaged medium was a wire phantom containing 8 × 3 cross-sectional wires.

Results and discussion
In this section, results for both architectures are presented. For the first architecture (FRI compressive SAB), simulations were initially carried out to validate the reconstruction accuracy of the algorithm. Thereafter, A-mode and B-mode imaging results were obtained using the hardware setup described above. B-mode images for the second architecture (quadrature SAB) were also obtained using the same hardware setup. Both architectures are compared in light of the resultant image quality and hardware efficiency.

Software simulations
The sampling scheme was first simulated using MATLAB in two scenarios prior to gathering hardware results. The scheme is demonstrated using ideal and noisy streams of Gaussian pulses in order to measure the accuracy of different FRI kernels. The MATLAB code used in these simulations is adapted from the code provided in [15,19].
Noiseless Case Consider a noiseless input signal x(t) comprising L = 5 delayed and weighted versions of a Gaussian pulse: where σ = 3 × 10 −3 and period τ = 200 μs. The time delays and amplitudes of all pulses in x(t) are allocated randomly. The signal is then convolved with the sampling kernel (lowpass filter, sinc, sum of sincs, etc). Assuming that an oversampling factor of F = 4 is used, the low-rate sampling frequency is f s = N τ = 2L×F τ = 2(4)(5) 200×10 −6 = 200 kHz, which implies that the bandwidth of the filter must be f c = f s /2 = 100 kHz. When using an ideal sinc filter, the reconstructed signal is exact to numerical precision, as shown in Fig. 4a. However, when using a fourth order Butterworth lowpass filter, time delay errors (22) are introduced between the input and reconstructed pulses due to the time constant of the filter (this may be corrected digitally after reconstruction). Amplitude errors are also presented, as shown in Fig. 4b. These errors may be minimised by increasing the order of the filter or by increasing the oversampling factor F, as discussed in the following section.
Noisy case Gaussian noise with variance σ 2 n is added to the samples to test the performance of the sampling scheme in non-ideal conditions. The SNR is defined as [15]:  Figure 5 illustrates the errors for various sampling kernels as a function of SNR. Since the causal filters introduce a time delay, the time error is calculated after shifting to correct the delay. The following filters are used in this analysis: • Sinc filter: s(t) = sinc(Bt), where B = 1/T. Evidently, the simple sinc kernel is more robust than the SoS kernel when the SNR is lower than 33 dB. For SNR values less than 19 dB, the response of the SoS kernel is unstable, whereas the responses of the other kernels are stable. For the causal filters (cascaded/biquad), the filter order is inversely proportional to the error, which tends towards a fixed value with increasing SNR. Causal filters naturally introduce systematic time delay errors. However, this may be corrected in software after reconstruction, as discussed in the previous section.
Oversampling The reconstruction accuracy may be improved by increasing the oversampling factor, F, at the expense of increased power consumption and transmission bandwidth. Figure 6 shows how the time and amplitude estimation errors change for different oversampling factors over a range of SNR values. In this case, a second order lowpass filter was used as the sampling kernel, and 500 experiments were carried out for L = 4 over a range of oversampling factors (1, 2, 4 and 8). Clearly, the estimation errors decrease as the oversampling factor increases. As the SNR increases beyond 40 dB, the time error decreases from 6.4 × 10 −5 τ ( F = 4 ) to 4.5 × 10 −8 τ ( F = 8 ) and the amplitude error from 0.208 ( 20.8% ) to 0.04 ( 4% ). For low SNR values, the estimation errors tend toward those of the sinc kernel. This trend is illustrated in Fig. 7, which highlights the performance of the sampling kernels when F is increased to 8. An optimal response may be approximated by either increasing F or the order of the filter. However, in doing so, the inherent tradeoffs must be carefully considered.

A-line Envelop Reconstruction
In order to validate the functionality of the compressive sensing algorithm in hardware (prior to beamforming), experiments were carried out using a single A-mode signal derived from the database described in the Methods section. The A-line signal may be modelled as a 1D stream of Gaussian pulses with width σ = 3 × 10 −7 . After demodulating and filtering below the Nyquist frequency, each I/Q signal is sampled at frequency f s and then reconstructed using the method in [13]. The results of the experiment are shown in Fig. 8. The number of samples per time window τ is N = 2L , where L is the number of Gaussian pulses per period. Three experiments were carried out for each cutoff frequency in the AFE. The parameters for these experiments are defined in Table 1. Figure 8b-d show the results for experiments 1, 2 and 3 respectively. The ideal envelop is shown in Fig. 8a. By observation, larger values of L yield better reconstruction accuracies, with the tradeoff being an increased sampling rate.
The original RF sampling frequency is 10 MHz. Thus, the sampling rate (for both I and Q) is reduced by a factor of 12.8, 4.9 and 1.4 for each experiment respectively. Thus, for L = 60 , there is no advantage in using compressive sensing as the sampling rate is

Time-delay estimation error [units of τ]
Amplitude estimation error higher than the ideal I/Q Nyquist sampling rate. There is thus a tradeoff between L, the reconstruction accuracy and the sampling rate. To achieve higher accuracy, both L and F should be large, but this results in an increased sampling rate and power consumption. The circuit topology realising the CS framework should be tuneable to maximise the performance and minimise the sampling rate.
B-Mode Imaging Finally, the compressive SAB architecture was evaluated by producing a full B-mode image using the RF dataset. Raw RF signals were sequentially demodulated in hardware using the cutoff frequencies in experiment 1 and 2,  corresponding to L values of 7 and 17. Subsequent beamforming was carried out in MATLAB using the synthetic aperture method for the following parameters: j max = 64 , k max = 352 (i.e. pixel resolution of 64 × 352 ), i max = 48 . The resultant images are shown in Fig. 9a ( L = 7 ), 9b ( L = 17 ), and 9b ( L = 40 ). These images may be compared against the "ideal" RF-beamformed reference in Fig. 12c. Lateral beamplots are also presented in Fig. 10, demonstrating the effect of L on lateral resolution. These beamplots were taken from a point target located at a depth of 66.5 mm, and correspond to a grayscale magnitude in decibels. The lateral resolution, SNR and contrast are calculated for each L value and presented in Table 1. Lateral resolution (LR) is defined as the full width at half maximum (FWHM), i.e. − 6 dB main lobe width. Relative contrast (RC) is defined as the ratio of the maximum grayscale magnitude at the point target to the average magnitude of the background. The SNR is the ratio of the mean to standard deviation of the image. For L = 7 , the SNR and lateral resolution is poor since fewer Gaussian pulses are used to reconstruct the I/Q signals. Increasing the number of Gaussian pulses L increases the reconstruction accuracy and thus improved image quality. In particular, for L = 40 , the SNR is 34 dB, and lateral resolution (2.07 mm) is closer to the ideal reference (1.2 mm). However, increasing L eventually pushes the low-rate sampling above that of the Nyquist quadrature sampling frequency.
The normalised root-mean-square-error (NRMSE) may be used as a quantitative measure of image quality for "ideal" RF-domain beamforming and FRI compressive beamforming. The NRMSE is computed on a scan-line/columnwise basis by comparing each pixel in the RF reference image, g j,k , to that of the measured image, f j,k , as follows: where max g j,k and min g j,k represent the maximum and minimum values of each column in g j,k respectively. For L = 7 and L = 17 , the NRMSE is 26 and 22% respectively, in comparison with the "ideal" RF case.

Quadrature SAB results
A-mode results were obtained by demodulating RF signals using the analogue frontend. Note that the lowpass filter cutoff frequency was set equal to the baseband signal bandwidth (1.25 MHz). The individual I/Q sampling rate is 2.5 MHz, since the I/Q baseband bandwidth is 1.25 MHz. Thus, the combined I/Q sampling rate (5 MHz) is reduced by a factor of 2 from the original 10 MHz RF sampling rate. The resultant I/Q signals were used to calculate the A-mode envelop, which is displayed in Fig. 11. The quadrature SAB method was tested in a similar manner by producing a B-mode image using the same RF dataset. The difference in this case is that demodulated signals from the analogue front-end were processed by the digital quadrature beamformer implemented using a Spartan-6LX ® FPGA (XC6SLX150-3FGG484I). The device utilisation summary is provided in Table 2 for the following parameters: f clk = 20 MHz, framerate = 7 Hz, N p = 16 , i max = 48 , j max = 64 , k max = 352 (i.e. pixel resolution of 64 × 352 ). On-chip BRAM was used to store the image and beamforming/apodisation parameters. The on-chip power consumption is proportional to the system clock frequency ( f clk ). For f clk = 20 MHz, the power consumption is estimated to be 296 mW (static power 172 mW, dynamic power 124 mW) by the Xilinx power estimator. This works out to be an equivalent power consumption of 4.6 mW/channel across the entire synthetic aperture (64 elements). Doubling the clock frequency allows for better spatial compounding Azimuth (mm) Three experiments were carried out for different values of i max . The results are provided in Table 3. Note that SNR, lateral resolution and relative contrast are calculated using the same method as in the compressive SAB section. Figure 12 compares the resultant B-mode images for i max = 16 and i max = 48 against the "ideal" RF beamforming reference (Fig. 12c). For i max = 48 (Fig. 12b), the NRMSE is 12.5% and the lateral   16.5% due to a reduction in the SNR caused by larger sidelobes and increased speckle noise. The reduction in image quality is qualitatively evident in Fig. 12a and in the lateral beamplots in Fig. 13, where the greyscale magnitude is plotted against lateral width. Decreasing i max from 48 to 3 results in a 9.1 dB SNR reduction. Lateral resolution increases from 1.2 to 2.5 mm. However, according to equation (20), there is an inverse relationship between the i max and frame rate. Decreasing i max from 48 to 8 elements leads to an increase in frame rate from 2.5 to 15 Hz. Similarly, for a constant frame rate, a sixfold reduction in the number of transmissions leads to a proportional decrease in power consumption, since the system clock frequency may be decreased or the area/logic capacity reduced as fewer pixels are calculated in parallel. The f # is also an important parameter affecting the width of the main lobe, and thus the lateral resolution. In Fig. 14, it can be seen that the main lobe width increases as f # increases from 0.5 to 3. Thus, better focusing is achieved with a smaller f # . The f # also affects the relative contrast, as shown in Fig. 15. For RF and quadrature beamforming, the contrast increases to a maximum of 40 and 40.5 dB at f # = 2.2 , after which it gradually decreases.

Comparison of architectures
Both architectures employ an identical hardware front-end, which demodulates RF ultrasound signals prior to beamforming. In the quadrature method, beamforming is carried out by an FPGA in the hardware front-end, yielding a B-mode image which a b c Fig. 12 Images of a phantom containing 8 × 3 cross-sectional wires. In a, b, quadrature beamforming is carried out with 16   and ADC (75 mW for L = 17 ). Thus, the total power is lower, but the tradeoff is image quality quality.
The lateral resolution for the quadrature SAB case (1.2 mm) is identical to the RF reference for the same number of transmissions. For the compressive SAB method, lateral resolution is significantly poorer (2.07 mm) for the best case ( L = 40 ). For this L value, SNR and relative contrast are similar to the quadrature SAB method. However, at lower L values, image quality becomes unacceptably poor due to sparse signal reconstruction of the I/Q envelop. Thus, it is evident that the quadrature SAB method yields better image quality (for an identical number of transmissions) than the compressive SAB method. The NRMSE for the quadrature architecture was 9% lower for the same i max value (48). L should be increased beyond 17 in order to achieve image quality that is comparable to the quadrature SAB case. This in turn increases the bit-rate and power consumption of the transmission link. For instance, for L = 17 ( f s = 1.05 MHz), the required bit rate is 2 × 1.05 × 10 = 21 Mbps, which is 4.7 times lower than the bit rate required to transmit RF samples at 100 Mbps. Transmission at this frequency is feasible using a typical 2.4 GHz, 802.11 g transceiver, for example, which operates up to a maximum of 54 Mbps. The frame rate is constrained, however, and more than one channel cannot practically be used due to the limited transmission line bandwidth. However, in the case of the quadrature method, the frame rate may be increased by adding more multiple channels at the expense of increased power consumption. Table 4 compares various state-of-the-art beamforming architectures with the proposed quadrature SAB architecture. The key advantage of the proposed architecture is a reduced number of analogue receiver channels, leading to reduced system complexity, area and cost on a system-level. The digital beamformer power consumption per channel (4.1 mW) is comparable to state-of-the-art digital and mixed-signal beamformers, on a per-channel basis. However, a proper comparison of the proposed digital receive beamformer with prior art is challenging as other designs focus on different imaging applications and have varying numbers of channels. For instance, the state-of-the-art mixed-signal beamformer in [20] consumes 276 mW across all 32 × 32 channels, which  is slightly more than the proposed design ( 262 mW across 64 channels). However, while the per-channel power in [20] is lower, scan-line conversion is not carried out on-chip, so offloading scan-lines at wireless transmission rates would be a significant challenge. On the other hand, the proposed beamformer converts I/Q signals to a B-mode image as a means of compressing data prior to conversion. This is because the proposed design targets small-scale wireless applications. The natural tradeoff for both architectures is frame rate-for i max = 8 , the frame rate (15 Hz) is half that of prior art. For a single analogue channel ( N a = 1 ), the maximum frame rate is limited by the reflection or acquisition time. The frame rate can be increased for the same i max if the number of parallel analogue channels ( N a ) is increased to 2 or more at the expense of increased power consumption. The delay resolution is also lower than that of prior art due to a relatively low oversampling factor. Future work will involve increasing the delay resolution by increasing the interpolation factor.

Conclusion
Two architectural solutions are compared for highly miniaturised ultrasound imaging applications. The architectures combine synthetic aperture beamforming (SAB) with quadrature sampling and compressive sensing respectively, in order to reduce the power consumption, cost, circuit area and the complexity of the receiver. The sampling rate is reduced by a factor of 2 (quadrature SAB) and 4.7 (compressive SAB), compared to the RF sampling rate. The quadrature SAB method yields a higher image SNR and 9% lower root mean squared error with respect to the RF-beamformed reference image than the compressive SAB method. The quadrature method is implemented on FPGA, with a total power consumption of 4.1 mW. Both architectures achieve a significant reduction in sampling rate, system complexity and area, allowing for aggressive miniaturisation of the imaging front-end.