Using SQL Hotspots in a Prioritization Heuristic for Detecting All Types of Web Application Vulnerabilities: Difference between revisions

Jump to navigation Jump to search
Line 62: Line 62:


== 4. Methodology ==
== 4. Methodology ==
{| class="wikitable" style="text-align: left; width: 100%;"
|+ Table 1. Results per Project
!
! WordPress
! WikkaWiki
|-
|Releases Analysed
|Nine
|Six
|-
| Security issue reports analyzed
| 97
| 61
|-
| Vulnerable files (over project's history)
| 26% (85 / 326)
| 29% (44 / 209)
|-
| Average number of hotspots (over project's history
| 255
| 92
|-
| Average percent of files having at least one hotspot
| 14.2%
| 8.42%
|-
|colspan="3" style="background: #eeeeee" | '''Hypotheses† about files'''
|-
| '''H1.''' The more hotspots a file contains per line of code, the more likely it is that the file contains any web application vulnerability.
| True (Logistic Regression, p<0.05)
| True (Logistic Regression, p<0.05)
|-
| '''H2.''' The more hotspots a file contains, the more times that file was changed due to any kind of vulnerability (not just input validation vulnerabilities).
| True (Simple Linear Regression, p<0.0001, Adjusted R2 = 0.4208)
| True (Simple Linear Regression, p<0.0001, Adjusted R2 = 0.3802)
|-
|colspan="3" style="background: #eeeeee" | '''Hypotheses about issue reports'''
|-
| '''H3'''. Input validation vulnerabilities result in a higher number average repository revisions than any other type of vulnerability*.
| True (MWW, p<0.05)
| True (MWW, p<0.05)
|-
|colspan="3" style="background: #eeeeee" | '''Hypotheses about prediction'''
|-
| '''H4.''' Hotspots can be used to predict files that will contain any type of web application vulnerability in the current release.
| True (Predictive Modeling, see Table 2)
| True (Predictive Modeling, see Table 3)
|-
| '''H5.''' The more hotspots a file contains, the more likely that file will be vulnerable in the next release.
| True (Positive Coefficient on Predictive Models)
| True (Positive Coefficient on Predictive Models)
|-
|colspan="3" style="background: #eeeeee" | '''Hypotheses comparing projects'''
|-
| '''H6.''' The average number of hotspots per file is more variable in WordPress than in WikkaWikki.
| colspan=2 | True (F-test, p<0.000001)
|-
| '''H7.''' WordPress suffered a higher proportion of input validation vulnerabilities than WikkaWiki.
| colspan=2 | True (Chi-Squared, p=0.0692)
|-
| '''H8.''' In WordPress, more of the lines of code that were changed due to security issues were hotspots.
| colspan=2 | True (Chi-Square, p<0.00001)
|-
| colspan=3 style="border-style: solid; border-width: 0 1px 1px 0" | *This finding is consistent with the report from SANS (see Section 1) that indicates that the most popular types of web application attacks are input validation vulnerabilities.
&dagger;Please note that we use the term "hypothesis" in this table with respect to scientific hypotheses and not statistical hypotheses.
|}


We conducted two case studies to empirically investigate eight hypothesis related to hotspot source code locations and vulnerabilities reported in the systems' bug tracking systems.  We present these hypotheses, as well their results, in Table 1. We will further explain the results in Section 5.  Our hypotheses point to the research objective: to improve the prioritization of security fortification efforts by investigating the ability of SQL hotspots to be used as the basis for a heuristic for the prediction of all vulnerability types.  We also include lines of code in our analysis as a way of improving the accuracy and predictive power of our heuristic along with SQL hotspots.  Specifically, we look at the relationship between hotspots and files (H1-H2), the amount of code change as related to the vulnerability type (H3), the predictive ability of hotspots for any vulnerability type (H4-H5), and the effect that collocating hotspots can have on the number and types of vulnerability in a given system (H6-H8).
We conducted two case studies to empirically investigate eight hypothesis related to hotspot source code locations and vulnerabilities reported in the systems' bug tracking systems.  We present these hypotheses, as well their results, in Table 1. We will further explain the results in Section 5.  Our hypotheses point to the research objective: to improve the prioritization of security fortification efforts by investigating the ability of SQL hotspots to be used as the basis for a heuristic for the prediction of all vulnerability types.  We also include lines of code in our analysis as a way of improving the accuracy and predictive power of our heuristic along with SQL hotspots.  Specifically, we look at the relationship between hotspots and files (H1-H2), the amount of code change as related to the vulnerability type (H3), the predictive ability of hotspots for any vulnerability type (H4-H5), and the effect that collocating hotspots can have on the number and types of vulnerability in a given system (H6-H8).