Tagged: sqldf

Example SQL Queries from Thursday

Last Thursday, we spoke briefly about sqldf, an R package for using SQL (“Structured Query Language”), which is a standardized way to get information out of many database systems. SQL is easy to learn, and is especially useful if you’re interested in interacting with others’ data (since those data will often be stored in a database). I recommended the W3Schools tutorial on SQL, which only takes a few hours to get through (I don’t like some of W3School’s other tutorials, but their SQL tutorial is a fine introduction, I think).

Below is the code that I presented for generating an example dataset and then experimenting with SQL:

# Generate some example data:

# For Table 1:
table1PersonID <- seq(1:10) namesToSampleFrom <- c("Mary","John","Huxley","Alice","Bob","Kevin","Janice") table1PersonName <- sample(namesToSampleFrom, length(table1PersonID), replace=TRUE) politicalPartiesToSampleFrom <- c("Republican","Democrat","Independent","Libertarian","Bull Moose Party") # This follows the advice at http://stackoverflow.com/a/9391911 #table1PoliticalParty <- sample(politicalParties, 10, replace=TRUE) table1PoliticalParty <- sample(seq(1:length(politicalPartiesToSampleFrom)), length(table1PersonID), replace=TRUE) # Make Table 1 into a DataFrame: table1 <- as.data.frame(cbind(table1PersonID, table1PersonName, table1PoliticalParty)) names(table1) <- c("PersonID","FirstName","PoliticalPartyForeignKey") # For Table 2: table2 <- as.data.frame(cbind(seq(1:length(politicalPartiesToSampleFrom)), politicalPartiesToSampleFrom)) names(table2) <- c("PoliticalPartyID","PoliticalPartyName") # View Table 1 and Table 2, to see how they both look, and how they should relate to one another: View(table1) View(table2) # Let's experiment with SQL. Try each of these statements one-by-one, and examine the output to see how the SQL worked: # Load the sqldf library: library('sqldf') # Example SQL statements: sqldf(" SELECT PersonID FROM table1 ") sqldf(" SELECT PersonID, PoliticalPartyForeignKey FROM table1 ") sqldf(" SELECT * FROM table1 ") sqldf(" SELECT COUNT(PersonID) FROM table1 ") sqldf(" SELECT FirstName, PoliticalPartyName FROM table1 JOIN table2 ON table1.PoliticalPartyForeignKey = table2.PoliticalPartyID ") # NOTE: For the two SQL examples below, you may need to change the name from 'Alice' to something else. Our example data above was generated with random names, so do View(table1) to see what names are actually present, and choose one of those. sqldf(" SELECT FirstName, PoliticalPartyName FROM table1 JOIN table2 ON table1.PoliticalPartyForeignKey = table2.PoliticalPartyID WHERE FirstName = 'Alice' ") sqldf(" SELECT DISTINCT PoliticalPartyName FROM table1 JOIN table2 ON table1.PoliticalPartyForeignKey = table2.PoliticalPartyID WHERE FirstName = 'Alice' ") sqldf(" SELECT table1.FirstName, table2.PoliticalPartyName, PossibleAcquaintances.FirstName FROM table1 JOIN table2 ON table1.PoliticalPartyForeignKey = table2.PoliticalPartyID JOIN table1 as PossibleAcquaintances ON table2.PoliticalPartyID = PossibleAcquaintances.PoliticalPartyForeignKey WHERE table1.PersonID != PossibleAcquaintances.PersonID ") sqldf(" SELECT table2.PoliticalPartyName, COUNT(*) as countOfPossibleRelationships FROM table1 JOIN table2 ON table1.PoliticalPartyForeignKey = table2.PoliticalPartyID JOIN table1 as PossibleAcquaintances ON table2.PoliticalPartyID = PossibleAcquaintances.PoliticalPartyForeignKey WHERE table1.PersonID != PossibleAcquaintances.PersonID GROUP BY table2.PoliticalPartyID ")