Proposing SQL Statement Coverage Metrics: Difference between revisions

No edit summary
 
(34 intermediate revisions by the same user not shown)
Line 1: Line 1:
Ben Smith, Younghee Shin, and Laurie Williams
B. Smith, Y. Shin, and L. Williams, "Proposing SQL Statement Coverage Metrics", Proceedings of the Fourth International Workshop on Software Engineering for Secure Systems (SESS 2008), co-located with ICSE, pp. 49-56, 2008.
== Abstract ==
== Abstract ==
''An increasing number of cyber attacks are occurring at the application layer when attackers use malicious input. These input validation vulnerabilities can be exploited by (among others) SQL injection, cross site scripting, and buffer overflow attacks. Statement coverage and similar test adequacy metrics have historically been used to assess the level of functional and unit testing which has been performed on an application. However, these currently-available metrics do not highlight how well the system protects itself through validation. In this paper, we propose two SQL injection input validation testing adequacy metrics: target statement coverage and input variable coverage. A test suite which satisfies both adequacy criteria can be leveraged as a solid foundation for input validation scanning with a blacklist. To determine whether it is feasible to calculate values for our two metrics, we perform a case study on a web healthcare  application and discuss some issues in implementation we have encountered. We find that the web healthcare application scored 96.7% target statement coverage and 98.5% input variable coverage''
''An increasing number of cyber attacks are occurring at the application layer when attackers use malicious input. These input validation vulnerabilities can be exploited by (among others) SQL injection, cross site scripting, and buffer overflow attacks. Statement coverage and similar test adequacy metrics have historically been used to assess the level of functional and unit testing which has been performed on an application. However, these currently-available metrics do not highlight how well the system protects itself through validation. In this paper, we propose two SQL injection input validation testing adequacy metrics: target statement coverage and input variable coverage. A test suite which satisfies both adequacy criteria can be leveraged as a solid foundation for input validation scanning with a blacklist. To determine whether it is feasible to calculate values for our two metrics, we perform a case study on a web healthcare  application and discuss some issues in implementation we have encountered. We find that the web healthcare application scored 96.7% target statement coverage and 98.5% input variable coverage''
Line 6: Line 6:
According to the National Vulnerability Database (NVD)<sup>1</sup>, more than half of all of the ever-increasing number of cyber vulnerabilities reported in 2002-2006 were input validation vulnerabilities. As Figure 1 shows, the number of input validation vulnerabilities is still increasing.  
According to the National Vulnerability Database (NVD)<sup>1</sup>, more than half of all of the ever-increasing number of cyber vulnerabilities reported in 2002-2006 were input validation vulnerabilities. As Figure 1 shows, the number of input validation vulnerabilities is still increasing.  


'''PLACEHOLDER FOR FIGURE 1'''<sup>2</sup>
<center>[[File:Sess-figure-1.png]]<br />
'''Figure 1. NVD's reported cyber vulnerabilities<sup>2</sup>'''</center>


Figure 1 illustrates the number of reported instances of each type of cyber vulnerability listed in the series legend for each year displayed in the x-axis. The curve with the square shaped points is the sum of all reported vulnerabilities that fall into the categories “SQL injection”, “XSS”, or “buffer overflow” when querying the National Vulnerability Database. The curve with diamond shaped points represents all cyber vulnerabilities reported for the year in the x-axis. For several years now, the number of reported input validation vulnerabilities has been half the total number of reported vulnerabilities. Additionally, the graph demonstrates that these curves are monotonically increasing; indicating that we are unlikely to see a drop in the future in ratio of reported input  
Figure 1 illustrates the number of reported instances of each type of cyber vulnerability listed in the series legend for each year displayed in the x-axis. The curve with the square shaped points is the sum of all reported vulnerabilities that fall into the categories “SQL injection”, “XSS”, or “buffer overflow” when querying the National Vulnerability Database. The curve with diamond shaped points represents all cyber vulnerabilities reported for the year in the x-axis. For several years now, the number of reported input validation vulnerabilities has been half the total number of reported vulnerabilities. Additionally, the graph demonstrates that these curves are monotonically increasing; indicating that we are unlikely to see a drop in the future in ratio of reported input  
Line 46: Line 47:
   $username = $_POST[‘username’];  
   $username = $_POST[‘username’];  
   $password = $_POST[‘password’];  
   $password = $_POST[‘password’];  
 
   //query the database for a user with username/pw  
   //query the database for a user with username/pw  
   $result = mysql_query(  
   $result = mysql_query(“select * from users where username =  ‘$username’ AND password = ‘$password’”);  
“select * from users where username =  
 
  ‘$username’ AND password = ‘$password’”);  
   //extract the first row of the resultset  
   //extract the first row of the resultset  
   $firstresult = mysql_fetch_array($result);  
   $firstresult = mysql_fetch_array($result);  
 
   //extract the “role” column from the result  
   //extract the “role” column from the result  
   $role = $firstresult[‘role’];  
   $role = $firstresult[‘role’];  
 
   //set a cookie for the user with their role  
   //set a cookie for the user with their role  
   setcookie(“userrole”, $role);  
   setcookie(“userrole”, $role);  
Line 66: Line 69:


   //from Figure 4; original code  
   //from Figure 4; original code  
   $result = mysql_query(  
   $result = mysql_query(“select * from users where username = '$username’ AND password = ‘$password’”);
  “select * from users where username =  
 
  ‘$username’ AND password = ‘$password’”);
   //code with inserted attack parameters  
   //code with inserted attack parameters  
   $result = mysql_query(  
   $result = mysql_query(“select * from users where username = ‘’ OR 1=1 -- AND password = ‘PASSWORD’”);  
  “select * from users where username =  
  ‘’ OR 1=1 -- AND password = ‘PASSWORD’”);  


<center>'''Figure 5. Example SQL statement, before and after</center>
<center>'''Figure 5. Example SQL statement, before and after</center>
Line 125: Line 124:
statements.  
statements.  


'''Server-side target statement coverage''' = '''PLACEHOLDER'''
'''Server-side target statement coverage''' = [[File:Sess-eqn-1.png]]


where Test(''t'') is a SQL statement tested at least once.  
where Test(''t'') is a SQL statement tested at least once.  
Line 139: Line 138:
'''Metric''': The input variable coverage criterion can be measured by the percentage of input variables tested at least once by the test set out of total number of input variables found in any target statement in the production code of the system.
'''Metric''': The input variable coverage criterion can be measured by the percentage of input variables tested at least once by the test set out of total number of input variables found in any target statement in the production code of the system.


'''Input variable coverage''' = '''PLACEHOLDER'''
'''Input variable coverage''' = [[File:Sess-eqn-2.png]]


where Test(f) is an input variable used in at least one test.
where Test(f) is an input variable used in at least one test.
Line 149: Line 148:
The relationship between target statement coverage and input variable coverage is not yet known; however, we contend that input variable coverage is a useful, finer-grained measurement.  
The relationship between target statement coverage and input variable coverage is not yet known; however, we contend that input variable coverage is a useful, finer-grained measurement.  


Input variable coverage has the effect of weighting a target statement which has more input variables more heavily. Since most input variables are each a separate potential vulnerability if not adequately validated, a target statement which contains more input variables is of a higher threat level.  
Input variable coverage has the effect of weighting a target statement which has more input variables more heavily. Since most input variables are each a separate potential vulnerability if not adequately validated, a target statement which contains more input variables is of a higher threat level.


== 4. Related Work ==
== 4. Related Work ==
Line 196: Line 195:
| 100%
| 100%
|-
|-
| CLASSNAME
| AuthDAO
| NUM
| 184
| NUM
| 8
| NUM
| 10
| NUM
| 2
| NUM
| 23
| NUM
| 98%
|-
| BkpStandardsDAO
| 61
| 1
| 5
| 4
| 0
| 0%
|-
| CPTCodesDAO
| 123
| 4
| 5
| 2
| 8
| 100%
|-
| EpidemicDAO
| 141
| 2
| 5
| 1
| 6
| 100%
|-
| FamilyDAO
| 112
| 3
| 5
| 2
| 6
| 100%
|-
| HealthRecordsDAO
| 65
| 2
| 3
| 2
| 6
| 100%
|-
| HospitalsDAO
| 180
| 7
| 8
| 2
| 18
| 88%
|-
| ICDCodesDAO
| 123
| 4
| 5
| 2
| 1
| 100%
|-
| NDCodesDAO
| 122
| 4
| 5
| 2
| 8
| 100%
|-
| OfficeVisitDAO
| 362
| 15
| 20
| 6
| 30
| 99%
|-
| PatientDAO
| 322
| 14
| 15
| 4
| 38
| 100%
|-
| PersonnelDAO
| 196
| 10
| 8
| 3
| 15
| 100%
|-
| RiskDAO
| 126
| 3
| 8
| 1
| 3
| 100%
|-
| TransactionDAO
| 135
| 5
| 7
| 3
| 10
| 93%
|-
| VisitRemindersDAO
| 166
| 2
| 3
| 1
| 6
| 100%
|-
|rowspan="2"|edu.ncsu.csc.itrust.dao
|DBUtil
| 29
| 1
| 2
| 0
| 1
| 69%
|-
| DAO Classes: 20 Total
| 2378
| 93
| 125
| 40
| 196
| 92%
|}
|}
</center>
</center>
Line 209: Line 337:


iTrust was written to conform to a MySQL<sup>8</sup> back-end. The MySQL JDBC connector was used to implement the data storage for the web application by connecting to a remotely executing instance of MySQL v5.1.11-remote-nt. The <code>java.sql.PreparedStatement</code> class is one way of representing SQL statements in the JDBC framework. Statement objects contain a series of overloaded methods all beginning with the word execute: <code>execute(…)</code>, <code>executeQuery(…)</code>, <code>executeUpdate(…)</code>, and <code>executeBatch()</code>. These methods are the java.sql way of issuing commands to the database and each of them represents a potential change to the database. These method calls, which we have previously introduced as ''target statements'', are the focus of our coverage metrics.  
iTrust was written to conform to a MySQL<sup>8</sup> back-end. The MySQL JDBC connector was used to implement the data storage for the web application by connecting to a remotely executing instance of MySQL v5.1.11-remote-nt. The <code>java.sql.PreparedStatement</code> class is one way of representing SQL statements in the JDBC framework. Statement objects contain a series of overloaded methods all beginning with the word execute: <code>execute(…)</code>, <code>executeQuery(…)</code>, <code>executeUpdate(…)</code>, and <code>executeBatch()</code>. These methods are the java.sql way of issuing commands to the database and each of them represents a potential change to the database. These method calls, which we have previously introduced as ''target statements'', are the focus of our coverage metrics.  
<center>[[File:SESS-Figure7.png]]<br />
'''Figure 7. General iTrust architecture'''</center>


The version of iTrust we used for this study is referred to as iTrust Fall 2007, named by the year and semester it was built and redistributed to a new set of graduate students. iTrust was written to execute in Java 1.6 and thus our testing was conducted with the corresponding JRE. Code instrumentation and testing were conducted in Eclipse v3.3 Europa on an IBM Lenovo T61p running Windows Vista Ultimate with a 2.40Ghz Intel Core Duo and 2 GB of RAM.
The version of iTrust we used for this study is referred to as iTrust Fall 2007, named by the year and semester it was built and redistributed to a new set of graduate students. iTrust was written to execute in Java 1.6 and thus our testing was conducted with the corresponding JRE. Code instrumentation and testing were conducted in Eclipse v3.3 Europa on an IBM Lenovo T61p running Windows Vista Ultimate with a 2.40Ghz Intel Core Duo and 2 GB of RAM.


== 9. References ==
=== 5.2 Study Setup ===
 
The primary challenge in collecting both of our proposed metrics is that there is currently no static tool which can integrate with the test harness JUnit to determine when SQL statements found within the code have been executed. As a result, we computed our metrics manually and via code instrumentation.
 
The code fragment in Figure 8 demonstrates the execution of a SQL statement found within an iTrust DAO. Each of the JDBC execute method calls represents communication with the DBMS and has the potential to change the database.
 
We assign each execute method call a unique identifier id in the range 1, 2, ... , n where n is the total number of execute method calls. We then instrument the code to contain a call to <code>SQLMarker.mark(id)</code>. This <code>SQLMarker</code> class interfaces with a research database we have setup to hold status information foreach statically identified execute method call. Before running the test suite, we load (or reload) a SQL table with records corresponding to each unique identifier from 1 to n. These records all contain a field <code>marked</code> which is set to <code>false</code>. The <code>SQLMarker.mark(id)</code> method changes <code>marked</code> to <code>true</code>. If <code>marked</code> is already true, it will remain true.
 
Using this technique, we can monitor the call status of each execute statement found within the iTrust production code. When the test suite is done executing, the table in our research database will contain n unique records which correspond to each method call in the iTrust production code. Each record will contain a boolean flag indicating whether the statement was called during test suite execution. The line with the comment instrumentation shows how this method is implemented in the example code in Figure 8.
 
  java.sql.Connection conn = factory.getConnection();
  java.sql.PreparedStatement ps = conn.prepareStatement("UPDATE globalVariables set SET VALUE = ? WHERE Name = ‘Timeout’;");
  ps.setInt(1, mins);
  SQLMarker.mark(1, 1); //instrumentation
  java.sql.ResultSet rs = ps.executeQuery();
 
<center>'''Figure 8. Code Instrumentation'''</center>
 
 
<code>SQLMarker.mark</code> is always placed immediately before the call to the execute SQL query (or target statement) so the method's execution will be recorded even if the statement throws an exception during its execution. There are issues in making the determination of the number of SQL statements actually possible in the production code; these will be addressed in Section 7.
 
To calculate input variable coverage, we included a second variable in the <code>SQLMarker.mark</code> method which allows us to record the number of input variables which were set in the execute method. Initially, the input variable records of each execute method are set to zero, and the <code>SQLMarker.mark</code> method sets them to the passed value. iTrust uses PreparedStatements for its SQL statements and as Figure 8 demonstrates, the number of input variables is always clearly visible in the production code because PreparedStatements require the explicit setting of each variable included in the statement. As with the determination of SQL statements, there are similar issues with determining the number of SQL input variables which we present in Section 7.
 
== 6. Results and Discussion ==
 
We found that 90 of the 93 SQL statements in the iTrust serverside production code were executed by the test suite, yielding a SQL statement coverage score of 96.7%. We found that 209 of the 212 SQL input variables found in the iTrust back-end were executed by the test suite, yielding a SQL variable coverage score of 98.5%. We find that iTrust is a very testable system with respect to SQL statement coverage, because each SQL statement, in essence, is embodied within a method of a DAO. This architectural decision is designed to allow the separation of concerns. For example the action of editing a patient’s records via user interface is separated from the action of actually updating that patient’s records in the database. We find that even though the refactoring of iTrust was intended to produce this high testability, there are still untested SQL statements within the production code. The Action classes of the iTrust framework represent procedures the client can perform with proper authorization. Since iTrust’s line coverage is at 91%, the results for iTrust are actually ''better'' than they would be for many existing systems due to its high testability.
 
The three uncovered SQL statements occurred in methods which were never called by any Action class and thus are never used in production. Two of the statements related to the management of hospitals and one statement offered an alternate way of managing procedural and diagnosis codes. The uncovered statements certainly could have eventually been used by new features added to the production and thus the fact that they are not executed by any test is still pertinent.
 
== 7. Limitations ==
 
Certain facets of the JDBC framework and of SQL in general make it difficult to establish a denominator for the ratio described for each of our coverage metrics. For example, remember that in calculating SQL statement coverage, we must find, mark and count each statically occurring SQL statement within the production code. The fragment presented in Figure 9 contains Java batch SQL statements. Similar to ''batch mode'' in MySQL, each statement is pushed into a single batch statement and then the statements are all executed with one commit. Batch statements can be used to increase efficiency or to help manage concurrency. We can count the number of executed SQL statements in a batch: a dummy variable could be instrumented within the for loop demonstrated in Figure 9 which increments each time a batch statement is added (e.g., <code>ps.addBatch()</code>). How many SQL statements are possible, though? The numerator will always be the same as the number of <code>DiagnosisBeans</code> in the variable <code>updateDiagnoses</code>. These beans are parsed from input the user passes to the Action class via the JSP to make changes to several records in one web form submission. The denominator is potentially infinite, however.
 
Additionally, the students who have worked on iTrust were required to use PreparedStatements, which elevates our resultant input variable coverage because PreparedStatements require explicit assignment to each input variable, and this may not be the case with other SQL connection methodologies. Furthermore, our metrics do not give any indication of how many input values have been tested in each input variable in each target statement.
 
This technique is currently only applicable to Java code which implements a JDBC interface and uses PreparedStatements to interact with a SQL database management system. Finally, we recognize that much legacy code is implemented using dynamically generated SQL queries and while our metric for target statement coverage could be applied, our metric for input variable coverage does not contain an adequate definition for counting the input variables in a dynamically generated query. Our approach will be repeatable and can generalize to other applications matching the above restrictions.
 
  public void updateDiscretionaryAccess(List<DiagnosisBean> updateDiagnoses)
  {
    java.sql.Connection conn = factory.getConnection();
    java.sql.PreparedStatement ps = conn.prepareStatement("UPDATE OVDiagnosis SET
    DiscretionaryAccess=? WHERE ID=?");
    for (DiagnosisBean d : updateDiagnoses) {
      ps.setBoolean(1, d.isDiscretionaryAccess());
      ps.setLong(2, d.getOvDiagnosisID());
      ps.addBatch();
  }
    SQLMarker.mark(1, 2);
    ps.executeBatch();
  }
 
<center>'''Figure 9. Batch SQL Statements'''</center>
 
== 8. Conclusions and Future Work ==
 
We have shown that a major portion of recent cyber vulnerabilities are occurring due to a lack of input validation testing. Testing strategies should incorporate new techniques to account for the likelihood of input validation attacks. Structural coverage metrics allow us to see how much of an application is executed by a given test set. We have shown that the notion of coverage can be extended to target statements and their input values. Finally, we have answered our research question with a case study which demonstrates that using the technique we describe, it is possible to dynamically gather accurate coverage metric values produced by a given test set. We have shown that the notion of coverage can be extended to target statements, and we introduce a technique for manually determining this coverage value.
 
Future improvements can make these metrics portable to different database management systems as well as usable in varying development languages.  We would eventually extend our metric to evaluate the percentage of all sources of user input that have been involved in a test case.  We would like to automate the process of collecting SQL statement coverage into a tool or plug-in, which can help developers rapidly assess the level of security testing which has been performed as well as find the statements that have not been tested with any test set.  This work will eventually be extended to cross-site scripting attacks and buffer overflow vulnerabilities.  Finally, we would like to integrate these coverage metrics with a larger framework which will allow target statements and variables which are included in the coverage to be tested against sets of pre-generated good and malicious input.
 
== 9. Acknowledgements ==
 
This work is supported by the National Science Foundation under CAREER Grant No. 0346903.  Any opinions expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
 
== 10. References ==


: <sup>[1]</sup> B. Beizer, Software testing techniques: Van Nostrand Reinhold Co. New York, NY, USA, 1990.
: <sup>[1]</sup> B. Beizer, Software testing techniques: Van Nostrand Reinhold Co. New York, NY, USA, 1990.
Line 234: Line 428:
: <sup>[19]</sup> H. Zhu, P. A. V. Hall, and J. H. R. May, "Software Unit Test Coverage and Adequacy," ACM Computing Surveys, vol. 29, no. 4, 1997.
: <sup>[19]</sup> H. Zhu, P. A. V. Hall, and J. H. R. May, "Software Unit Test Coverage and Adequacy," ACM Computing Surveys, vol. 29, no. 4, 1997.


 
== 11. End Notes ==
== 10. Footnotes ==


# http://nvd.nist.gov/
# http://nvd.nist.gov/
Line 245: Line 438:
# http://works.dgic.co.jp/djunit/
# http://works.dgic.co.jp/djunit/
# For our case study, we used MySQL v5.0.45-community-nt found at http://www.mysql.com/
# For our case study, we used MySQL v5.0.45-community-nt found at http://www.mysql.com/
[[Category:Workshop Papers]]