Machine-learned exclusion limits without binning

Ernesto Arganda (Departamento de Física Teórica, Universidad Autónoma de Madrid, Cantoblanco, Madrid, 28049, Spain; Instituto de Física Teórica UAM-CSIC, C/ Nicolás Cabrera 13-15, Campus de Cantoblanco, Madrid, 28049, Spain; IFLP, CONICET-Dpto. de Física, Universidad Nacional de La Plata, C.C. 67, La Plata, 1900, Argentina) ; Andres Perez (Departamento de Física Teórica, Universidad Autónoma de Madrid, Cantoblanco, Madrid, 28049, Spain; Instituto de Física Teórica UAM-CSIC, C/ Nicolás Cabrera 13-15, Campus de Cantoblanco, Madrid, 28049, Spain; IFLP, CONICET-Dpto. de Física, Universidad Nacional de La Plata, C.C. 67, La Plata, 1900, Argentina) ; Martín de los Rios (Departamento de Física Teórica, Universidad Autónoma de Madrid, Cantoblanco, Madrid, 28049, Spain; Instituto de Física Teórica UAM-CSIC, C/ Nicolás Cabrera 13-15, Campus de Cantoblanco, Madrid, 28049, Spain) ; Rosa Sandá Seoane (Instituto de Física Teórica UAM-CSIC, C/ Nicolás Cabrera 13-15, Campus de Cantoblanco, Madrid, 28049, Spain)

Machine-learned likelihoods (MLL) combines machine-learning classification techniques with likelihood-based inference tests to estimate the experimental sensitivity of high-dimensional data sets. We extend the MLL method by including kernel density estimators (KDE) to avoid binning the classifier output to extract the resulting one-dimensional signal and background probability density functions. We first test our method on toy models generated with multivariate Gaussian distributions, where the true probability distribution functions are known. Later, we apply the method to two cases of interest at the LHC: a search for exotic Higgs bosons, and a $$Z'$$ Z boson decaying into lepton pairs. In contrast to physical-based quantities, the typical fluctuations of the ML outputs give non-smooth probability distributions for pure-signal and pure-background samples. The non-smoothness is propagated into the density estimation due to the good performance and flexibility of the KDE method. We study its impact on the final significance computation, and we compare the results using the average of several independent ML output realizations, which allows us to obtain smoother distributions. We conclude that the significance estimation turns out to be not sensible to this issue.

{
  "_oai": {
    "updated": "2024-03-26T03:32:35Z", 
    "id": "oai:repo.scoap3.org:82278", 
    "sets": [
      "EPJC"
    ]
  }, 
  "authors": [
    {
      "affiliations": [
        {
          "country": "Spain", 
          "value": "Departamento de F\u00edsica Te\u00f3rica, Universidad Aut\u00f3noma de Madrid, Cantoblanco, Madrid, 28049, Spain", 
          "organization": "Universidad Aut\u00f3noma de Madrid"
        }, 
        {
          "country": "Spain", 
          "value": "Instituto de F\u00edsica Te\u00f3rica UAM-CSIC, C/ Nicol\u00e1s Cabrera 13-15, Campus de Cantoblanco, Madrid, 28049, Spain", 
          "organization": "Instituto de F\u00edsica Te\u00f3rica UAM-CSIC"
        }, 
        {
          "country": "Argentina", 
          "value": "IFLP, CONICET-Dpto. de F\u00edsica, Universidad Nacional de La Plata, C.C. 67, La Plata, 1900, Argentina", 
          "organization": "Universidad Nacional de La Plata"
        }
      ], 
      "surname": "Arganda", 
      "email": "ernesto.arganda@uam.es", 
      "full_name": "Arganda, Ernesto", 
      "given_names": "Ernesto"
    }, 
    {
      "affiliations": [
        {
          "country": "Spain", 
          "value": "Departamento de F\u00edsica Te\u00f3rica, Universidad Aut\u00f3noma de Madrid, Cantoblanco, Madrid, 28049, Spain", 
          "organization": "Universidad Aut\u00f3noma de Madrid"
        }, 
        {
          "country": "Spain", 
          "value": "Instituto de F\u00edsica Te\u00f3rica UAM-CSIC, C/ Nicol\u00e1s Cabrera 13-15, Campus de Cantoblanco, Madrid, 28049, Spain", 
          "organization": "Instituto de F\u00edsica Te\u00f3rica UAM-CSIC"
        }, 
        {
          "country": "Argentina", 
          "value": "IFLP, CONICET-Dpto. de F\u00edsica, Universidad Nacional de La Plata, C.C. 67, La Plata, 1900, Argentina", 
          "organization": "Universidad Nacional de La Plata"
        }
      ], 
      "surname": "Perez", 
      "email": "andresd.perez@uam.es", 
      "full_name": "Perez, Andres", 
      "given_names": "Andres"
    }, 
    {
      "affiliations": [
        {
          "country": "Spain", 
          "value": "Departamento de F\u00edsica Te\u00f3rica, Universidad Aut\u00f3noma de Madrid, Cantoblanco, Madrid, 28049, Spain", 
          "organization": "Universidad Aut\u00f3noma de Madrid"
        }, 
        {
          "country": "Spain", 
          "value": "Instituto de F\u00edsica Te\u00f3rica UAM-CSIC, C/ Nicol\u00e1s Cabrera 13-15, Campus de Cantoblanco, Madrid, 28049, Spain", 
          "organization": "Instituto de F\u00edsica Te\u00f3rica UAM-CSIC"
        }
      ], 
      "surname": "de los Rios", 
      "email": "martin.delosrios@uam.es", 
      "full_name": "de los Rios, Mart\u00edn", 
      "given_names": "Mart\u00edn"
    }, 
    {
      "affiliations": [
        {
          "country": "Spain", 
          "value": "Instituto de F\u00edsica Te\u00f3rica UAM-CSIC, C/ Nicol\u00e1s Cabrera 13-15, Campus de Cantoblanco, Madrid, 28049, Spain", 
          "organization": "Instituto de F\u00edsica Te\u00f3rica UAM-CSIC"
        }
      ], 
      "surname": "Sand\u00e1 Seoane", 
      "email": "r.sanda@csic.es", 
      "full_name": "Sand\u00e1 Seoane, Rosa", 
      "given_names": "Rosa"
    }
  ], 
  "titles": [
    {
      "source": "Springer", 
      "title": "Machine-learned exclusion limits without binning"
    }
  ], 
  "dois": [
    {
      "value": "10.1140/epjc/s10052-023-12314-z"
    }
  ], 
  "publication_info": [
    {
      "page_end": "14", 
      "journal_title": "European Physical Journal C", 
      "material": "article", 
      "journal_volume": "83", 
      "artid": "s10052-023-12314-z", 
      "year": 2023, 
      "page_start": "1", 
      "journal_issue": "12"
    }
  ], 
  "$schema": "http://repo.scoap3.org/schemas/hep.json", 
  "acquisition_source": {
    "date": "2024-03-26T03:31:45.516476", 
    "source": "Springer", 
    "method": "Springer", 
    "submission_number": "202ab3aaeb2111eeae4696b6a0e1ccbd"
  }, 
  "page_nr": [
    14
  ], 
  "license": [
    {
      "url": "https://creativecommons.org/licenses//by/4.0", 
      "license": "CC-BY-4.0"
    }
  ], 
  "copyright": [
    {
      "holder": "The Author(s)", 
      "year": "2023"
    }
  ], 
  "control_number": "82278", 
  "record_creation_date": "2023-12-19T21:30:23.717150", 
  "_files": [
    {
      "checksum": "md5:491c11fdab103ecd04b51ec2dd4924c8", 
      "filetype": "xml", 
      "bucket": "40f09017-9e64-4a2d-92f7-fb6b16909f43", 
      "version_id": "6c0bf024-5a52-4f05-bce5-85fa475678aa", 
      "key": "10.1140/epjc/s10052-023-12314-z.xml", 
      "size": 14891
    }, 
    {
      "checksum": "md5:22a2adcdd49faed4d5b55f6ba1fef648", 
      "filetype": "pdf/a", 
      "bucket": "40f09017-9e64-4a2d-92f7-fb6b16909f43", 
      "version_id": "4219c21e-a72c-4e9b-b669-6f33e1f56208", 
      "key": "10.1140/epjc/s10052-023-12314-z_a.pdf", 
      "size": 1784362
    }
  ], 
  "collections": [
    {
      "primary": "European Physical Journal C"
    }
  ], 
  "abstracts": [
    {
      "source": "Springer", 
      "value": "Machine-learned likelihoods (MLL) combines machine-learning classification techniques with likelihood-based inference tests to estimate the experimental sensitivity of high-dimensional data sets. We extend the MLL method by including kernel density estimators (KDE) to avoid binning the classifier output to extract the resulting one-dimensional signal and background probability density functions. We first test our method on toy models generated with multivariate Gaussian distributions, where the true probability distribution functions are known. Later, we apply the method to two cases of interest at the LHC: a search for exotic Higgs bosons, and a  $$Z'$$  <math> <msup> <mi>Z</mi> <mo>\u2032</mo> </msup> </math>   boson decaying into lepton pairs. In contrast to physical-based quantities, the typical fluctuations of the ML outputs give non-smooth probability distributions for pure-signal and pure-background samples. The non-smoothness is propagated into the density estimation due to the good performance and flexibility of the KDE method. We study its impact on the final significance computation, and we compare the results using the average of several independent ML output realizations, which allows us to obtain smoother distributions. We conclude that the significance estimation turns out to be not sensible to this issue."
    }
  ], 
  "imprints": [
    {
      "date": "2023-12-19", 
      "publisher": "Springer"
    }
  ]
}
Published on:
19 December 2023
Publisher:
Springer
Published in:
European Physical Journal C , Volume 83 (2023)
Issue 12
Pages 1-14
DOI:
https://doi.org/10.1140/epjc/s10052-023-12314-z
Copyrights:
The Author(s)
Licence:
CC-BY-4.0

Fulltext files: