• 2019-07
  • 2019-08
  • 2019-09
  • 2019-10
  • 2019-11
  • 2020-03
  • 2020-07
  • 2020-08
  • br Next we discuss different approaches to quantify the


    Next, we discuss different approaches to quantify the robust-ness of feature selector or ranking algorithms by (1) a conventional analysis and (2) a visual-based study.
    2.4.1. Conventional (-)-Bicuculline methiodide analysis
    To study the stability of the feature ranking or selection tech-niques several metrics have been proposed.
    Similarity measures
    Consider r and r the output of a feature ranking technique ap-plied to two subsamples of D. The most widely used metric to measure the similarity between two ranking lists is the Spearman’s rank correlation coe cient (SR) [30]. The SR between two ranked lists r and r is defined by
    where ri is the rank of feature-i. SR values range from −1 to 1. It takes the value one when the rankings are identical and the value zero when there is no correlation.
    When we attempt to measure the distance between two top-k lists s and s with the most relevant k features, several metrics have been presented (for details see [30]). In this work we use the
    Jaccard stability index (JI) that can be defined as
    of environmental exposures and their interaction with genetic fac-
    tors in common tumors in Spain (prostate, breast, colorectal, gas-
    (4) troesophageal and chronic lymphocytic leukemia).All participants
    signed an informed consent. Approval for the study was obtained
    where s and s are the two feature subsets, r is the number of fea- from the ethical review boards of all recruiting centers [7]. In-
    tures that are common in both lists and l the number of features stances with missing values have been removed leading to a
    that appear only in one of the two lists. The JI lies in the range dataset with 3295 instances: 2230 are controls, while the other
    The stability for a set of rankings or lists
    netic variables (Single Nucleotide Polymorphisms -SNPs), 48 envi-
    When it comes to evaluate the stability of a feature selec- ronmental factors including red meat, vegetable consumption, BMI,
    tion (or ranking) algorithm that provides several results A = physical activity, alcohol consumption and 5 variables regarding
    family history of CRC, sex, age, level of education and race.
    Next, the variables considered in this study are listed.
    similarities and average the results, what leads to a single scalar
    coe cient, Jaccard stability index [22,30] or Kuncheva’s
    stability index [24], for example.
    2.4.2. Visual based stability analysis
    The outcome of a feature ranking algorithm can be interpreted
    • Environmental factors. physical activity, BMI, alcohol consump-
    as a point in a high dimensional space (with p dimensions). The
    tion, smoking. Dietary factors: consumption of vegetable, red
    stability of a ranking feature selector is commonly measured as
    meat, legume, fruit, cereals, fish, dairy, oil, calcium, carotenoids,
    the dissimilarity
    or distance between different outcomes of the
    cholesterol, edible, total energy, ethanol in the past decade,
    same feature selector on slightly different datasets. As mentioned
    ethanol in the present, monounsaturated fats, polyunsaturated
    above, stability
    is assessed computing pairwise similarities be-
    fats, saturated fats, total fats, folic acid, glucids, total intake
    tween points in that high dimensional space and averaging the re-
    in grams, Iron, magnesium, niacin, phosphorus, potassium,
    sults. In this case, the ranking data is turned into a single number
    fiber, animal protein, vegetable protein, total protein, retinoids,