Using SQL Hotspots in a Prioritization Heuristic for Detecting All Types of Web Application Vulnerabilities: Difference between revisions

← Older edit Newer edit →

@@ Line 112: / Line 112: @@
 We manually edited the <code>tracs</code> dataset to add the CWE classifier to each issue report that was security related. We were also interested in comparing the proportion of input validation vulnerabilities in each project as a part of our research hypotheses (H7), so we additionally added a variable to the <code>tracs</code> dataset that indicated a "yes/no" as to whether the Trac report in question was due to an input validation vulnerability.  CWE classifies several vulnerability types as input validation vulnerabilities<sup>22</sup> and we followed this classification in our analysis.  We used the input validation vulnerability variable ''only'' for evaluating H7; we tested all other hypotheses and conducted the predictive modeling using the full dataset, irrespective of a reported vulnerabilities classification as input validation or non-input validation.
+== 4.5. Detecting Changed Hotspots ==
+We were also interested in comparing the amount of change due to problems with SQL hotspots in each project.  To measure the amount of this change, we calculated the proportion of lines changed in each project that contained SQL hotspots.  Using the procedure described in Section 4.3, we used our script to automatically examine each line of code that developers committed to fix security issues. We combined this technique with the script described in Section 4.2 to identify lines of code that developers committed to fix security issues that contained hotspots.  We call the total number of lines of code that developers changed due to security issues Y, and the total number of lines of code within that subset that are also hotspots X.  We calculated the proportion of lines of code changed due to security issues that were also hotspots X divided by Y.
+== 4.6. Statistical Analysis ==
+We used the R project for statistical computing to perform our statistical analysis of the data in this case study<sup>23</sup>. We used the statistical tests provided by R to determine whether any differences we observed in any two samples occurred by chance or were statistically significant.
+We used the '''Mann-Whitney-Wilcoxon (MWW)''' statistic to perform any population-based comparison between two independent samples, such as between vulnerable and neutral files, or between files that contain hotspots and files that do not.  The MWW test is a non-parametric determination of whether two independent samples of observations have equally large values.  We used a non-parametric statistical test because we cannot assume that the outcomes in our data set are normally distributed.  We also used the '''Chi-Squared Test''' to determine whether there was a statistically significance difference in the proportion of positive outcomes in two population groups.  We also used the '''F Test''' to measure the difference in variance between two sample groups.
 == 5. Results ==