# Biosignals learning and synthesis using deep neural networks

- David Belo
^{1}Email authorView ORCID ID profile, - João Rodrigues
^{1}, - João R. Vaz
^{2, 3, 4}, - Pedro Pezarat-Correia
^{2}and - Hugo Gamboa
^{1}

**16**:115

https://doi.org/10.1186/s12938-017-0405-0

© The Author(s) 2017

**Received: **17 May 2017

**Accepted: **16 September 2017

**Published: **25 September 2017

## Abstract

### Background

Modeling physiological signals is a complex task both for understanding and synthesize biomedical signals. We propose a deep neural network model that learns and synthesizes biosignals, validated by the morphological equivalence of the original ones. This research could lead the creation of novel algorithms for signal reconstruction in heavily noisy data and source detection in biomedical engineering field.

### Method

The present work explores the gated recurrent units (GRU) employed in the training of respiration (RESP), electromyograms (EMG) and electrocardiograms (ECG). Each signal is pre-processed, segmented and quantized in a specific number of classes, corresponding to the amplitude of each sample and fed to the model, which is composed by an embedded matrix, three GRU blocks and a softmax function. This network is trained by adjusting its internal parameters, acquiring the representation of the abstract notion of the next value based on the previous ones. The simulated signal was generated by forecasting a random value and re-feeding itself.

### Results and conclusions

The resulting generated signals are similar with the morphological expression of the originals. During the learning process, after a set of iterations, the model starts to grasp the basic morphological characteristics of the signal and later their cyclic characteristics. After training, these models’ prediction are closer to the signals that trained them, specially the RESP and ECG. This synthesis mechanism has shown relevant results that inspire the use to characterize signals from other physiological sources.

## Keywords

## Background

Biosignal synthesis has been applied in biomedical engineering research to mimic the chemical and physical processes that can be measured with sensors and characterized by their quantification. Each type of biosignal has a characteristic morphology depending on the measured surface or organ, the source, i.e. the individual that generated it, the contamination noise and, in some cases, the pathology.

This paper proposes the application of a deep neural networks (DNN) to accurately synthesize the morphologies of a biosignal. The hypothesis is that if the created models are capable of generating clean signals, apart from replacing the unrecognizable signals due to contamination of noise, but they could also evaluate the distinction between types of signals and, if the signal topology permits, its source. This capacity will be able to unlock novel algorithms, not only for signal denoising and reconstruction, but also for event detection, classification and validation. The DNN architecture is a fundamental key in this study, since it can learn from the morphology itself, not requiring the input of more features nor the compatibility for one specific signal, unlike other methods existent in the bibliography.

The remainder of this paper will follow the explanation of the morphology of the three biosignals that were used to validate the proposed architecture, followed by the review of related works and the structure of the gated recurrent units (GRU), the main component of the DNN architecture used and a specific application of these, the character language model, which is an inspiration for the creation of the proposed architecture. “Dataset” and “Methods” sections will cover the dataset and used methods, “Results” and “Discussion” sections will provide the experimental results along with their detailed discussion. Conclusion and future work will be presented in “Conclusion and future remarks” section.

### Signal morphology

The Greek etymology term morphology is: *morph*—‘shape’, ‘form’ and *logy*—‘study of’, therefore it is the study of shapes or forms. In this paper, the definition used for biosignal morphology is the shape of it’s graphical representation, visualized and perceived by the human eye in terms of periodicity, amplitude, structure, disruptions and clearness in the form of the signal.

For example, the RESP signal presented in Fig. 1a was recorded in the thorax region by a pneumatic respiration transducer, i.e. an extensiometer embedded in a elastic belt that captures the changes in volume. The extension and compression of the chest events while breathing are transcribed in the small changes in its frequency when breathing normally.

An EMG wave is a signal with a high frequency with periodic change of amplitudes (Fig. 1b), each burst is correlated with the muscle activation from the neurophysiological events that precede the muscle contraction.

The family of biosignals presented in Fig. 1c, d, denominated ECG. The characteristic form of these biosignals may be described as being a baseline that oscillate in a cyclic pattern of five different waves, reflecting each phase of the heart beat: P—corresponding to the atrial contraction; the QRS complex— responsible for the contraction of the ventricles; and, T—consequence of the ventricular relaxation [1–3].

The morphology disparity between biosignals of the same family may reside on individual traits, different electrode placement in relation to the measured organ, artifacts (caused by internal or external sources), noise or pathological events. Due to the increase of external devices that measure biosignals, the level of noise corrupting the signals is substantial making them unreadable.

### Related work

The applicability of synthesized signals range from denoising, reconstruction of unreadable signal to event detection, classification and validation and the most relevant research is on the generation of EMG and ECG.

The existent approaches in EMG reproduction include the use of a sum of diphasic waves [4, 5], the implementation of a random EMG tonic wave and multiplication by a sinus wave [6] or using autoregressive models and mixing them with gaussian noise [7].

In the ECG end, various research articles rely on its theoretical expression, such as the combination of cosine waves[8], the coupling of differential equation [1] or using delayed harmonic waves [9]. After the parametrization of a model, adopting signal processing and machine learning methods, one can synthesize signals by exploiting its prediction power. For instance, features may be extracted with wavelet transform [10, 11] and Hilbert-Transform [12, 13] and the ECG may be generated using dynamic time warping [10], hidden Markov models (HMM) [11], polynomal approximation [12] or artificial neural networks (ANN) [13]. Atoui et al. [14] uses a multilayer ANN but feeding it with raw signal extracted from a 12-Led ECG considering five derivations and establishing a relationship between them.

The standard for cardiac monitors defined an artificial wave based on characteristic parameters, such as the QRS amplitude and time, as a standard for designing and validating event detectors [15].

### Deep neural networks and gated recurrent units

The ANN algorithms learn from data by optimizing multiple parameters, which turns them more capable of solving specific problems [16, 17]. The DNN represent the evolution of the “shallow” networks with the increase of hidden layers, complexity, computational power and learning capabilities. The long short-term memory (LSTM) was proposed by Hochreiter and Schmidhuber [18–21] as a solution to the vanishing and exploding gradient issues. This architecture has multiple gate layers concerned with memory management that are capable of learning long-term dependencies by forgetting and updating the layers state.

The GRU architecture is a simplification of the LSTM algorithm. They are both recurrent neural networks (RNN) as they may be represented as a conventional feedforward neural network, in which the next phase depends on the previous ones. In the unfolded version of the feedback loop, the sequential data is passed through each network, changing its internal state, recording the dynamic changes of the input [21, 22].

GRU is known for converging faster, without the cost of accuracy, in comparison to LSTM. The other advantage of this algorithm is the high prediction rates while estimating the sequence of time-series data, in several fields, without the input of an extensive amount of features, nor their selection. The promising results may be seen in several areas, such as speech [17, 26, 27], music [28, 29], audio analysis [30] and handwriting recognition [31, 32]. These architectures are also used in the area of language comprehension for text translation [33], text generation [34] and image and video description [35–37].

More detailed information is available in the “Methods” section.

### Character-level language model

In the context of learning of natural language, the GRU model has been used in the prediction of the next character in a text. From a sequence of characters in a sentence, the RNN model is capable of learning the correct structure of phrases. Graves [38] describes the example of “wikipedia experiment” where the network was able to generate text as a wikipedia template. Even though the phrases were well structured and grammatically correct, the overall article does not have any meaningful content. Just as Koski [39] relates the ECG with the syntactic expression, where words and grammar represent the patterns dictated by a set of rules, this paper explores the extrapolation of this concept into physiological signals area.

## Dataset

In order to reconstruct data from unrecognizable signals, due to noise, and for the detection of abnormal events, the model needs to learn the clean version of the signal. Accordingly the chosen dataset was based on three principles: free from noise; acquired from individuals without pathologies; and, the signal morphology must be directly interpreted by a human without any special expertise. This document uses RESP, EMG and ECG signals [40, 41].

The ECG and RESP were downloaded from the Physiobank database, which was created under the auspices of the national center for research resources of the national institutes of health [40]. The dataset contains acquisitions from the first ten people with ages between 21 and 34 years old and first ten with ages between 68 and 81 while exposed to 120 min of continuous supine resting electrocardiograph recording while watching the Disney’s movie Fantasia [41].

## Methods

The proposed DNN^{1} sequential architecture is depicted in Fig. 5. In sum, after reducing noise, quantizing and segmenting the signal, each scalar sample—\(x_n\)—is fed to the network. This value corresponds to the index of the embedded matrix—*E*—transforming into the column vector \(\hat{x}_n\), the inner representation of the sample. The result of the three GRU layers is vector \(\hat{o}_n\) that will be the input of a regression node and of a softmax function giving a probability density vector *o*. After the models are trained, resorting to RMSprop algorithm, with different signals, these are synthesized by exploiting their prediction power.

The detailed explanation of the pre-processing, model, training method, signal generation and model evaluation will be addressed in this section.

### Pre-processing

*x*will be a vector where each position corresponds to its associated step \(k \in \{0, 1,\ldots S_D-1\}\). Fig. 4 depicts an example of a signal’s TW before and after this process for \(S_D = 16\).

*y*is simply a de-phase of the input

*x*by one sample, so that \({y_n = x_{n+1}}\), where \({y_N=0}.\) For the sake of computational power, the last step of the pre-processing is the segmentation of the dataset

*D*into TWs with dimension

*W*and overlap of 2/3, resulting in two matrices

*X*and

*Y*with the same dimensions of

*D*×

*W*.

### Signal sample embedding

After pre-processing the signal, the input is transformed using an embedding matrix—*E*—before entering the GRU layers, common in the bibliography [43–45]. Instead of a one-hot vector to represent the input signal sample, a low-dimensional vector is used. The square matrix *E* of size (\(S_D\) × \(S_D\)) contains all the representation vectors for each possible signal sample value \(\hat{x}_{n} = E_{[:x_n]}\) considering that \(x_n\) is an integer scalar, \(E_{[:x_n]}\) represents the \(x_n{\text{-th}}\) column vector of *E*. This matrix is used as a dictionary that gives an image vector \(\hat{x}_{n}\) of the scalar \(x_n\). The \(\hat{x}_{n}\) is the input vector of size \(S_D\) of the first GRU node as the \(n{\text{-th}}\) sample of the TW. The matrix *E* is a learning parameter that starts with initial random values but will adjust while training the model [43, 44].

### Gated recurrent unit layers

*r*—has the following general equation

*z*is computed by

*r*is close to zero,

*h*is computed ignoring

*h*, using only \({x}_n\) value. The candidate for the next state results from the compilation between the new inputs and previous cell states. In the reset step, the candidate is allowed to forget the cell’s previous states, leaving the new inputs as the main guidelines for posterior outcomes. Therefore,

*r*is the gate responsible for effectively replace irrelevant state information. Each hidden unit has their own

*r*and

*z*gates, and, consequently, will learn to capture the biological signal’s time-dependent features.

*softmax*function. The output vector

*o*is the probability density function of the next sample—\(x_{n+1}\)—of having the value-

*k*:

*k*, the output of the model is a vector with \(S_D\) elements.

### Training

*n*, in this case, since \(y_n\) is an integer value, \(p_n\) is an “one-hot vector”, where the position \(y_n\) has the value of maximum probability, against the zero in the rest of the positions. When training, the desired optimum parameter value \(\hat{\theta }\) is the minimum loss, depending on the parameter values:

*g*follows:

*t*with the following equation:

*t*for the parameters \(\theta\), \(\eta\) is the learning rate and \(\epsilon\) is a smoothing term (normally of value \(1 \times 10^{-8}\)) that prevents the division by zero. The term \(E[]_t\) is the average at epoch

*t*and only depends on the previous average and the decay factor \(\gamma\):

While training the dataset, each signal was divided in a fixed number of TWs batches—\(B_D\).

### Signal synthesizer

After training the model, the synthesis of the signal was performed by the re-feeding the input of the model with the last prediction. Since the output is an array with the probabilities of the next sample step, the selected value is based on a probability density function, hence the predicted value is a semi-random choice.

The generated signals were based on a model that was trained with the referred three distinct signal types for each individual, totaling 54 different models.

### Model evaluation

Each signal was pre-processed and separated into a training and a testing set: 128 random TW of the first 33% of the signal were used for train; 66% of the signals were used for test. The test windows had a size of 512 samples and the number of windows dependent on the size of the signal .

The mean and standard deviation were calculated for all windows, for each signal and mo

## Results

### Resp generator

*Fantasia*dataset [41] and the green graphic is the synthesized version. The purple area, corresponds the the probability of each sample, and the almost invisibility is due to the high confidence of the network prediction. The used parameters were \(W = 1024\), \(H_D = 512\) and the \(S_D = 64\). The lower frequency of this signal required a higher training window and its simplicity required less epochs for the learning process than the other two. After some try-and-error it was understood that the

*W*parameter is important, because the model must encode in its states at least one full cycle of the signal in each

*TW*while training.

### Electromyogram generator

Figure 7a represents an EMG signal from the *gastrocnemius medialis* muscle while pedaling in a cycloergometer, in which the active phase represents the muscle activation while pushing the pedal.

The EMG were downsampled to 250 from 1000 Hz to maintain conformity with the other models. The selected dimensions where \(TW = 512\), \(H_D = 512\) and the \(S_D = 64\). The \(H_D\) had to be increased because the wide range in frequencies needed a recipient capable of coding this information inside the network.

### Electrocardiogram generator

*W*value must comprise at least one full cycle of the biosignal, since the sample frequency is 250 Hz and the period of a normal ECG is 60 beats/min.

### Model evolution

*Fatasia*dataset with the parameters \(W = 512\), \(S_D = 64\) and \(H_D = 256\). While the model was being trained, several copies were made with the purpose of having a graphical representation of how the model was learning. Therefore it is depicted six generated signals for a different number of trained epochs. For example, the first graphic (Fig. 9a) is a result of the saved model right after initialization. As for the second graphic (Fig. 9b), it is the result of the prediction of the model trained with the same batch but after 20 epochs.

### Model evaluation

## Discussion

While observing the RESP synthesis, depicted in Fig. 6 the model learned the patterns, the amplitude and even the small differences in frequency throughout time. The average error (Fig. 10a) is lower for the source RESP, reflecting the capacity of this algorithm to reproduce the signal that trained it. The differences from other RESP and other type of signals are also visible, even though the last are more pronounced, in parallel to the standard deviation.

In the predicted EMG (Fig. 7b), the cycles are visible and the frequency of the bursts are presented, conjugating the higher maxima after the local minima. In the synthetic version we can verify that this state machine consistently identify the activation time location. On the other hand, the bursts’ shape shows some inaccuracies, particularly in the last burst where the activation duration is clearly longer in the synthesized signal.

While analysing the error (Fig. 10c) one may realize that the EMG signals are quite similar between the source and other EMG. The reason behind this suggests various hypothesis for this fact: one is that the EMG between subjects performing the same task are quite similar; other could reside in the fact that the network did was not able to distinguish the various different frequencies in each individual, because of the inherent complexity of the signal; or, the training period ended before reaching the global minimum of the loss function. When comparing the standard deviation between the EMG data and the other type of signals, it is possible to conclude that the models are capable of synthesizing EMG that are significantly different, even if the correspondent mean, in some cases, is close.

In relation to the ECG generator, all the ECG characteristics are visible both in the original and synthesized signals. The model did not only learn the frequency and principal characteristics of the compound wave, but also the baseline at \(k \simeq 40\) and the values of the local minima and maxima. The R peaks have small fluctuations in value, reflecting the original ones. It is possible to observe that after the 600th sample of the synthesized signal, that the model made an error in prediction, but it was capable of readjusting the earlier form regaining the proper morphology.

One further aspect of this ECG modulation is the fact that the network also learns the individual traits of the person. In Fig. 9 the synthesized ECG produced by the model trained with subject seven of the *Fantasia* dataset is clearly different of the one created from the subject three, depicted in Fig. 8.

The learning process of the ECG model depicted in Fig. 9, in the first epoch the model parameters were initiated with random values resulting in a sequence with a mean value and a standard deviation. After 20 epochs, the model starts to learn a few characteristics of the signal such as R spikes. Although there is no notion of frequency, there is the sense that the signal must return to a base value, in this case, of approximately \(k=10\). After 30 epochs the some R peaks become more defined and some rudimentary forms of the T or P waves appear.

After 50 epochs (Fig. 9d), the model insists on introducing at least one slow wave before and after the QRS complex, and, even though it forgets at times a P or a T between two R, it doesn’t repeat these waves. Some of the R peaks do not have the final form Finally, after 80 epochs, even though there is a latency in between some waves, it is possible to see the notion of periodicity, even though it is not yet correct. The definition of the P, Q, R, S and T waves and their sequence also reflects the original ECG characteristics (Fig. 8a). And, finally, after 410 epochs, when the model was finished learning, the model can now reproduce the signal with minor differences, with the notion of frequency.

After the models evaluation, the results in Fig. 10 show that the network as higher error for the types of signal that did not train suggesting that the models recognized the type of signal that generated them. As for Fig. 10e the models were even able to reproduce with low error, each of the sources that trained them. While observing the matrix, one can speculate that some signals are closer morphologically in relation to others. For example, ECG 8 is closer to ECG 15 than ECG 11, which represents the higher error value.

Not only the mean error (Fig. 10e), but also the standard deviation (Fig. 10f) pose low values for prediction, implying the characteristic nature of these signals, significantly different from individual to individual. These networks because specific for the ECG signal trait as they have a high error in all the other signals, both of the same and other types of biosignal.

## Conclusion and future remarks

With the achievements of this work we were able to replicate the morphology of the three presented biosignals using DNN architectures. The two main aspects of this architecture that differ from the bibliography, is the capacity to learn several and replicate several signals and that it is is blind, such as no features are given *a priori* about the input signal. The models also need just a few seconds of signal for training, taking into account that the ECG and EMG results only needed approximately 175 seconds and the RESP 350 s. The low error rate of the RESP and ECG also reflect the possibility of using this model to identify the source of these signals.

Some limitations also reside in the proposed architecture, such as: the sensitivity of the pre-processing to the noise disruptions with high amplitudes, as the whole signal may be compromised with the changes of the maximum and/or minimum values; the signal dimension reduction represents a loss of information; the fact that this architecture, in its core, is a state machine, and therefore possesses a limited number of states and, consequently, the memory capacity of coding an extensive signal; and, the computational burden, known in the RNN architectures, requiring a significant time to train data.

Further observation on the learned models will be directed to the search of the internal neural structures that generate the morphological aspects of the signal.

Future work will be in the creation of novel biometrics methods by selecting the model with lower loss for an input signal. The results displayed by the loss function could both evaluate the type of the signal or the source that originated it to identify the respective person.

Other direction will be in the detection of noisy areas of the signals, and if above a certain threshold, the learned model could be synthesized to replace the damaged time interval, increasing the capacity of feature extraction.

Other possible contributions of this paper is the application of this algorithm using TWs of normal against pathological physiological signals, as the deviation from the trained model could give a report regarding what segment the pathological events occur.

While exploring the inner workings of how the DNN model learns and generates the biosignals’ morphological characteristics it may be possible to generate valuable information on how to deliver novel procedures for decision making for support to the medical field.

The used library to process the ANN algorithm was *Theano* [42], a Python library that allows to define, optimize and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It has been one of the most used CPU and GPU mathematical compilers, and has been used to produce many state-of-the-art machine learning models since 1998 [42].

## Declarations

### Authors’ contributions

DB wrote the content of the paper, elaborated the methods and results. JR assisted in the writing process and reviewed the document. JV and PP-C provided validation and support for the EMG and EMG synthesized data. HG guided through the elaboration of the methods and assisted in the writing process and reviewed the document. All authors read and approved the final manuscript.

### Acknowledgements

This research was supported by Fundação para a Ciência e Tecnologia.

### Competing interests

The authors declare that they have no competing interests.

### Availability of data and supporting materials

The database Fantasia is available in Physionet.org: https://physionet.org/physiobank/database/fantasia/ In the EMG dataset all the participants were fully informed of the purposes and risks associated with the experiment before providing written informed consent. The University’s Institutional Review Board approved the experiment (37/2015) and all procedures adhered to the Declaration of Helsinki.

### Ethics approval and consent to participate

All participants provided written informed consent.

### Funding

This paper was written under the projects AHA CMUP-ERI/HCI/0046 and INSIDE CMUP-ERI/HCI/051/2013 both financed by Fundação para a Ciência e Tecnologia (FCT).

### Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

## Authors’ Affiliations

## References

- McSharry PE, Clifford GD, Tarassenko L, Smith LA. A dynamical model for generating synthetic electrocardiogram signals. IEEE Trans Biomed Eng. 2003;50(3):289–94.View ArticleGoogle Scholar
- Craven D, McGinley B, Kilmartin L, Glavin M, Jones E. Adaptive dictionary reconstruction for compressed sensing of ecg signals. IEEE J Biomed Health Inform. 2016;21(3):645–54.View ArticleGoogle Scholar
- Hesar H, Mohebbi M. ECG denoising using marginalized particle extended kalman filter with an automatic particle weighting strategy. IEEE J Biomed Health Inform. 2016;21(3):635–44.View ArticleGoogle Scholar
- Moore AD. Synthesized emg waves and their implications. Am J Phys Med Rehabilit. 1967;46(3):1302.Google Scholar
- Person R, Libkind M. Simulation of electromyograms showing interference patterns. Electroencephalogr Clin Neurophysiol. 1970;28(6):625–32.View ArticleGoogle Scholar
- Belavỳ DL, Mehnert A, Wilson S, Richardson CA. Analysis of phasic and tonic electromyographic signal characteristics: electromyographic synthesis and comparison of novel morphological and linear-envelope approaches. J Electromyogr Kinesiol. 2009;19(1):10–21.View ArticleGoogle Scholar
- Gamboa H, Matias R, Araújo T, Veloso A. Electromyography onset detection: new methodology. J Biomech. 2012;45:494.View ArticleGoogle Scholar
- Murthy I, Reddy M. ECG synthesis via discrete cosine transform. In: Engineering in medicine and biology society, 1989. Images of the twenty-first century, proceedings of the annual international conference of the IEEE engineering in 1989. New York: IEEE; 1989. p. 773–4.Google Scholar
- Wu X, Sengupta K. Dynamic waveform shaping with picosecond time widths. IEEE J Solid-State Circuits. 2016;52(2):389–405.View ArticleGoogle Scholar
- Turajlic E. A novel algorithm for ECG parametrization and synthesis. In: Biomedical engineering and sciences (IECBES), 2012 IEEE EMBS conference on, New York: IEEE; 2012. p. 927–32.Google Scholar
- Crouse MS, Nowak RD, Baraniuk RG. Wavelet-based statistical signal processing using hidden Markov models. IEEE Trans Signal Process. 1998;46(4):886–902.MathSciNetView ArticleGoogle Scholar
- Nunes J-C, Nait-Ali A. Hilbert transform-based ECG modeling. Biomed Eng. 2005;39(3):133–7.View ArticleGoogle Scholar
- Rodríguez R, Bila J, Mexicano A, Cervantes S, Ponce R, Nghien N. Hilbert-Huang transform and neural networks for electrocardiogram modeling and prediction. In: Natural computation (ICNC), 2014 10th international conference on, New York: IEEE; 2014. p. 561–7.Google Scholar
- Atoui H., Fayn J, Rubel P. A neural network approach for patient-specific 12-lead ecg synthesis in patient monitoring environments. In: Computers in cardiology. Piscataway: IEEE; 2004. p. 161–4.Google Scholar
- Ruha A, Sallinen S, Nissila S. A real-time microprocessor QRS detector system with a 1-ms timing accuracy for the measurement of ambulatory HRV. IEEE Trans Biomed Eng. 1997;44(3):159–67.View ArticleGoogle Scholar
- Copeland J, Proudfoot D. Alan Turing’s forgotten ideas in computer science. Sci Am. 1999;280(4):98–103.View ArticleGoogle Scholar
- Sak H, Senior A, Beaufays F. Long short-term memory recurrent neural for large scale acoustic modeling. In: Annual conference of the international speech communication association—interspeech 2014. Singapore; 2014. p. 338–42.Google Scholar
- Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9:1735–80.View ArticleGoogle Scholar
- Schmidhuber J. Deep learning in neural networks: an overview. Neural Netw. 2014;61:85–117.View ArticleGoogle Scholar
- Bengio Y. Learning deep architectures for AI. Found Trends Mach Learn. 2009;2(1):1–127.View ArticleMATHGoogle Scholar
- Bengio Y, Lamblin P, Popovici D, Larochelle H. Greedy layer-wise training of deep networks. In: Schölkopf B, Platt JC, Hoffman T, editors. Advances in neural information processing systems 19. Cambridge: MIT Press; 2007. p. 153–60.Google Scholar
- Jozefowicz R, Zaremba W, Sutskever I. An empirical exploration of recurrent network architectures. In: Proceedings of the 32nd international conference on machine learning. Lille, France. vol. 37; 2015.Google Scholar
- Graves A, Schmidhuber J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 2005;18:602–10.View ArticleGoogle Scholar
- Wu Z, King S. Investigating gated recurrent neural networks for speech synthesis. CoRR; 2016. arXiv:1601.02539.
- Greff K, Srivastava RK, Koutník J, Steunebrink BR, Schmidhuber J. LSTM: a search space odyssey. CoRR; 2015. arXiv:1503.04069.
- Zen H, Sak H. Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis. In: 2015 IEEE international conference on acoustics, speech and signal processing, ICASSP 2015, South Brisbane, Queensland, Australia, April 19-24, 2015. http://dx.doi.org/10.1109/ICASSP.2015.7178816; 2015. p. 4470–4.
- Graves A, Mohamed A, Hinton GE. Speech recognition with deep recurrent neural networks. CoRR; 2013. arXiv:1303.5778.
- Boulanger-Lewandowski N, Bengio Y, Vincent P. Modeling temporal dependencies in high-dimensional sequences: application to polyphonic music generation and transcription. ArXiv e-prints. arXiv: 1206.6392; 2012.
- Eck D, Schmidhuber J. A first look at music composition using lstm recurrent neural networks. Istituto Dalle Molle Di Studi Sull Intelligenza Artificiale. 2002; 103.Google Scholar
- Marchi E, Gabrielli L, Ferroni G, Eyben F, Squartini S, Schuller B. Multi-resolution linear prediction based features for audio onset detection with bidirectional LSTM neural networks. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP). New York: IEEE; 2014. p. 2164–8.Google Scholar
- Graves A, Liwicki M, Fernández S, Bertolami R, Bunke H, Schmidhuber J. A novel connectionist system for unconstrained handwriting recognition. IEEE Trans Pattern Anal Mach Intell. 2009;31(5):855–68.View ArticleGoogle Scholar
- Pham V, Kermorvant C, Louradour J. Dropout improves recurrent neural networks for handwriting recognition. CoRR; 2013. arXiv:1312.4569.
- Luong MT, Pham H, Manning CD. Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025; 2015.
- Sutskever I, Martens J, Hinton GE. Generating text with recurrent neural networks. In: Getoor L, Scheffer T, editors. ICML. Madison: Omnipress; 2011. p. 1017–24.Google Scholar
- Donahu J, Anne Hendricks L, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T. Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2015. p. 2625–34.Google Scholar
- Vinyals O, Toshev A, Bengio S, Erhan D. Show and tell: a neural image caption generator. CoRR; 2014. arXiv:1411.4555.
- Ren M, Kiros R, Zemel RS. Image question answering: a visual semantic embedding model and a new dataset. CoRR; 2015. arXiv:1505.02074.
- Graves A. Generating sequences with recurrent neural networks. CoRR; 2013. arXiv:1308.0850.
- Koski A. Modelling ECG signals with hidden Markov models. Artific Intell Med. 1996;8(5):453–71.View ArticleGoogle Scholar
- Goldberger AL, Amaral LA, Glass L, Hausdorff JM, Ivanov PC, Mark RG, Mietus JE, Moody GB, Peng C-K, Stanley HE. Physiobank, physiotoolkit, and physionet components of a new research resource for complex physiologic signals. Circulation. 2000;101(23):215–20.View ArticleGoogle Scholar
- Iyengar N, Peng C, Morin R, Goldberger AL, Lipsitz LA. Age-related alterations in the fractal scaling of cardiac interbeat interval dynamics. Ame J Physiol. 1996;271(4):1078–84.Google Scholar
- Team TTD, Al-Rfou R, Alain G, Almahairi A, Angermueller C, Bahdanau D, Ballas N, Bastien F, Bayer J, Belikov A, et al. Theano: a python framework for fast computation of mathematical expressions. arXiv preprint arXiv:1605.02688; 2016.
- Frome A, Corrado GS, Shlens J, Bengio S, Dean J, Mikolov T, et al. Devise: a deep visual-semantic embedding model. In: Advances in neural information processing systems; 2013. p. 2121–9.Google Scholar
- Pennington J, Socher R, Manning CD. Glove: global vectors for word representation. EMNLP. 2014;14:1532–43.Google Scholar
- Karpathy A, Johnson J, Fei-Fei L. Visualizing and understanding recurrent networks. arXiv preprint arXiv:1506.02078; 2015.
- Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078; 2014.
- Chung J, Gulcehre C, Cho K, Bengio Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555; 2014.
- Bottou L. Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT’2010, New York: Springer; 2010. p. 177–86.Google Scholar
- Williams RJ, Peng J. An efficient gradient-based algorithm for on-line training of recurrent network trajectories. Neural Comput. 1990;2(4):490–501.View ArticleGoogle Scholar
- Tieleman T, Hinton G.: lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude. COURSERA: Neural Netw Mach Learn. 2012;4(2):26–31.Google Scholar