^{1}

^{2}

^{1}

^{3}.

The Bayesian approach for feedforward neural networks has been applied to the extraction of the nucleon axial form factor from the neutrino-deuteron-scattering data measured by the Argonne National Laboratory bubble-chamber experiment. This framework allows to perform a model-independent determination of the axial form factor from data. When the low

A good understanding of neutrino interactions with matter are crucial to achieve the precision goals of oscillation experiments that aim at a precise determination of neutrino properties

In particular, a source of uncertainty arises from the nucleon axial form factor

Empirical information about

On the other hand, as pointed out in Refs.

A promising source of information about

The choice of a specific functional form of

In this paper, we demonstrate that model-independent information about ^{1}

See Sec.

To perform this semiparametric analysis, we make use of feedforward neural networks,^{2}

Semiparametric analyses of experimental data can also rely on other families of functions, such as polynomials or radial-basis functions

In the present paper, we employ the Bayesian framework for neural networks to find the most favorable ^{3}

A global analysis including data from other experiments will be addressed in a subsequent study.

In Sec.This section reviews the Bayesian approach formulated for the feedforward neural networks and its adaptation to the problem of the extraction of the nucleon axial form factor that best represents a given set of data. The proposed framework is quite general: It does not rely on physics assumptions about the functional form of

The general idea is the following: Given a data set, a statistical model is built. The model is characterized by a number of probability densities, which are obtained using feedforward neural networks. A detailed account of the different ingredients of the approach is given in this section. The specific application to ANL CCQE neutrino-deuteron scattering data is left for the subsequent sections.

Our aim is to obtain a statistical model which has the ability to generate ^{4}

Fitting the axial form factor with the dipole parametrization is an example of the parametric approach.

In this case, the uncertainties for the model prediction are either overestimated or underestimated. The semiparametric method takes the best features from both (i) and (ii) approaches. In this method, instead of a single specific functional model, a broad class of functions is considered. The optimal model is chosen among them. The neural-network approach is a realization of the semiparametric method. In particular, the feedforward neutral networks form a class of functions with unlimited adaptive abilities.In order to model the nucleon axial form factor, a feedforward neural network in a multilayer perceptron (MLP) configuration is considered. The concept of MLP comes from neuroscience

Feedforward neural network in an MLP configuration

To every unit (blue circles in Fig.

The

Note that for the bias unit

Let us introduce the MLP

It has been proved and demonstrated (Cybenko theorem)

We seek for a model-independent parametrization of

the domain of

as

The feedforward neural network of Eq.

As described above, a MLP is a nonlinear map defined by some number of the adaptive parameters. The increase in the number of hidden units improves MLP's ability to reproduce the data. However, when the number of units (weights) is too large, the model tends to overfit the data, and it reproduces the statistical noise. As a result, its predictive power is lost. On the other hand, if the network is too small, then the data are underfitted. This competition between two extreme cases is know in statistics as the bias-variance trade-off

Bayesian statistics provides methods to face the bias-variance trade-off dilemma. Indeed, the Bayesian approach naturally embodies Occam's razor

We adopt the Bayesian framework for the feedforward neural network formulated by MacKay

Let us consider the set of neural networks,

If one assumes, at the beginning of the analysis, that all MLP configurations are equally suited for describing the data, then the following relations between prior probabilities hold

On the other hand, the posterior probability for the weights of a given MLP reads

In order to calculate the posterior

It is also assumed that the initial values of the weights are Gaussian distributed (the arguments supporting this choice are collected in Appendix

In principle, to get the evidence

In the adopted approach, it is assumed that the posterior distributions have a Gaussian shape. Hence, to get the necessary information about

In this approach, which corresponds to the type-II maximum likelihood method of conventional statistics

Within the present approximation, the evidence for a given model is cast in an analytical form. Namely,

The evidence contains two contributions: Occam's factor [

The error Eq.

The neutrino-induced CCQE,

Deuterium-filled bubble-chamber experiments actually measured

The dependence of

In the ANL experiment, the interactions of muon neutrinos in a 12-ft bubble chamber filled with liquid deuterium were studied

The predicted number of events in each bin is calculated similarly as in Ref.

As stated in the previous section, the likelihood [Eq. ^{5}

This is a more conservative value of the flux-normalization uncertainty than the ANL estimate of 15%

It is known that the low-

Additionally, for each data set, we consider the cross-section model both with and without

We consider a MLP with

Consider a MLP with a fixed number of hidden units

using the Bayesian learning algorithm (

set the initial value of

initialize randomly the values of the weights;

perform training until the maximum of the posterior is reached; at each iteration step, update the values of weights and

Calculate the evidence for each of the obtained MLP fits;

repeat steps (1)–(3) for various initial configurations of

among all registered fits, choose the best one according to the evidence;

repeat steps (1)–(5) for

among the best fits obtained for

The optimal configuration of parameters is obtained using the Levenberg-Marquardt algorithm

The analysis of the BIN0, BIN1, and BIN2 data sets has been independently performed. For each set, both cross-section models with and without deuteron corrections have been studied. For the default analyses,

In order to compare quantitatively different analyses, one needs to take into account the relative data normalization

As described in Sec.

In order to illustrate the performance of the training algorithm, in Fig.

Note that all the best models within each MLP type reproduce well the ANL data. This is illustrated in Fig.

Distribution of the ANL number of events and the best fits obtained for MLPs with

Our main results, i.e., the best fits to BIN0, BIN1, and BIN2 data for the model with and without the deuteron correction, and

Best fits of the axial form factor obtained from the analysis of the BIN0, BIN1, and BIN2 data sets. The top (bottom) panel presents the results obtained without (with) deuteron corrections. The shaded areas denote

The best MLP fits, obtained for the analysis of the BIN0, BIN1, and BIN2 data with and without deuteron corrections;

Both fits for the BIN0 data set, which contains all the data from the original ANL measurement with and without deuteron corrections show a ^{6}

Although the large uncertainty does not exclude positive values.

which is at odds with all available determinations. We also observe that theThe height of the ^{7}

It is worth mentioning that fits with a negative slope of

There are several possible sources of this unexpected behavior of the fits to the BIN0 set, namely, (i) an improper description of the nuclear corrections; (ii) a low quality of the measurements at low-^{8}

The magnetic form factors of the nucleon at very low-

In the low-

Impact of the deuteron corrections on the axial form factor fits. Results of the best fits to the BIN0, BIN1, and BIN2 data sets with and without the deuteron correction together with relative uncertainties. All curves for the BIN1 and BIN2 cases nearly overlap.

In Fig.

Dependence of

The impact of

Impact of the

In order to compare the Bayesian neural-network results with the traditional approach, we have performed a conventional analysis of the ANL data assuming the dipole parametrization for the axial form factor Eq. ^{9}

For these analyses, the

Comparison of the dipole with the neural network fits to the BIN0 and BIN1 data, the deuteron corrections included. The shaded areas denote

Fits of the dipole axial form factor

Let us stress that the

The first Bayesian neural-network analysis of the neutrino-deuteron scattering data has been performed. The reported study has been oriented to the extraction of the axial form factor from the ANL CCQE data, searching for deviations from the dipole form. With the full ANL data set, the analysis leads to an axial form factor which has a positive slope at

New more precise measurements of the neutrino cross sections on hydrogen and deuterium are needed to unravel the axial structure of the nucleon. Techniques, such as the one applied in the present paper, will prove valuable in such a scenario.

We thank A. Meyer and M. González-Alonso for useful communications. The calculations have been carried out in the Wrocław Centre for Networking and Supercomputing (Ref.

The prior distribution of the weights, Eqs.

internal symmetry: The exchange of any two units in the hidden layer does not change the functional form of the network and its output values;

the sigmoid activation function

the ANL data are concentrated in the region

Properties (P4) and (P2) constrain the weights ^{10}

At

Note that both negative and positive values of weights

The limits for the weights in the linear layer are less obvious. Property (P5) provides a constraint on the weights in the linear output layer

We, therefore, conclude that the prior density for the weights should cover the hypercube

Eventually, let us remark that the most general Gaussian prior has the form

It is worth noting that each of the sigmoids that constitute the neural networks typically describes a particular feature of the function. If a soft dependence is preferred by the data, some units might be redundant and take very similar values for the weights.

The best-fit parametrization for the BIN0 data set with the deuteron correction included is

The weights

In this case,