erDiagram
branch {
string branch_name PK
string branch_city
decimal assets
}
customer {
int ID PK
string customer_name
string customer_street
string customer_city
}
loan {
int loan_number PK
string branch_name FK
decimal amount
}
borrower {
int ID FK
int loan_number FK
}
account {
int account_number PK
string branch_name FK
decimal balance
}
depositor {
int ID FK
int account_number FK
}
branch ||--o{ loan : "issues"
branch ||--o{ account : "holds"
customer ||--o{ borrower : "borrows"
loan ||--o{ borrower : "borrowed_by"
customer ||--o{ depositor : "deposits"
account ||--o{ depositor : "held_by"
EPPS 6354: Information Management
Assignment 01: Introduction to Database Systems
Question 1: Three Applications Using Database Systems
At work, I use Databricks as a cloud data platform to store and analyze large datasets. These datasets are stored in structured tables within a data lakehouse architecture and accessed using SQL queries and data pipelines written in R and Python.
A second example is Google Search, which stores and indexes vast amounts of information from websites across the internet. When a user enters a query, Google retrieves relevant results from its indexed database of web pages. The system relies on large-scale databases to store page content, metadata, and ranking signals, making it a clear example of a database-driven application.
Finally, I frequently use Google Sheets to store and manage structured data for collaborative projects. Information is stored in rows and columns and can be accessed through formulas, filters, and queries. Because the data persist over time, can be updated by multiple users, and can be searched or analyzed through structured queries, Google Sheets functions as a lightweight database-driven application.
Question 2: Domain Project Proposals
Application 1: Community Risk Mapping
- Purpose: Help policymakers identify neighborhoods where targeted interventions could improve community outcomes.
- Functions: Store data on crime, evictions, and demographic indicators; allow filtering by location and time; generate maps and summary statistics.
- Simple interface design: An interactive map dashboard with a sidebar for selecting indicators and date ranges, with charts updating automatically.
Application 2: Federal Grant Opportunity Tracker
- Purpose: Track federal grant opportunities and record changes or removals over time.
- Functions: Store grant details (agency, funding amount, deadlines); allow searching and filtering; track historical changes to opportunities.
- Simple interface design: A searchable table of grants with filters and a detail page showing grant information and change history.
Application 3: School–Eviction Impact Tool
- Purpose: Analyze how eviction filings relate to student attendance and outcomes.
- Functions: Link eviction and student data, identify high-impact apartment complexes, and generate summary statistics.
- Simple interface design: A dashboard with charts and tables showing eviction-related indicators by school and neighborhood.
Question 4: Why Did NoSQL Systems Emerge?
NoSQL systems emerged in the 2000s driven by the explosive growth of web-scale applications (Google, Amazon, Facebook) that required handling massive volumes of unstructured and semi-structured data with high availability and horizontal scalability. Traditional relational databases enforced rigid schemas and ACID transactions that created bottlenecks at scale.
| Feature | Relational (SQL) | NoSQL |
|---|---|---|
| Schema | Fixed, predefined schema | Flexible, schema-less |
| Scaling | Vertical (bigger hardware) | Horizontal (add more nodes) |
| Data model | Tables with rows and columns | Document, key-value, graph, or column-family |
| Transactions | ACID compliance | BASE (eventual consistency) |
| Query language | Standardized SQL | Varies by system |
| Best for | Complex queries, joins, integrity | High volume, rapid iteration, distributed data |
Assignment 02: The Relational Model
Question 1: Schema, Relation, and Instance
Relation schema defines the structure — the name of the relation and its attributes with their domains. It is like a blueprint. Example:
instructor(ID, name, dept_name, salary)Relation is a table that conforms to a given schema. It is the named structure itself, persisting across changes in the data.
Instance is a specific snapshot of the data in a relation at a given point in time — the actual set of tuples (rows). For example, the instructor relation might currently contain 50 rows; next semester after hiring, it might contain 53.
University database example: The schema student(ID, name, dept_name, tot_cred) defines what a student record looks like. The relation student is the logical table. The instance is the current set of enrolled students with their actual IDs, names, departments, and credits.
Question 2: Bank Database Schema Diagram
Given the bank database:
branch(branch_name, branch_city, assets)
customer(ID, customer_name, customer_street, customer_city)
loan(loan_number, branch_name, amount)
borrower(ID, loan_number)
account(account_number, branch_name, balance)
depositor(ID, account_number)
Question 3: Primary and Foreign Keys
Primary Keys (underlined in diagram above):
branch: branch_namecustomer: ID (though customer_name is also unique)loan: loan_numberborrower: (ID, loan_number) — composite, since loans can be associated with multiple customersaccount: account_numberdepositor: (ID, account_number) — composite, since accounts can be associated with multiple customers
Foreign Keys:
loan.branch_name→branch.branch_nameborrower.ID→customer.IDborrower.loan_number→loan.loan_numberaccount.branch_name→branch.branch_namedepositor.ID→customer.IDdepositor.account_number→account.account_number
Question 4: AI / LLM Applications in Database Management
Natural Language to SQL Translation: LLMs can convert plain-English questions (e.g., “Which customers have loans over $10,000?”) into executable SQL queries, reducing the barrier to entry for non-technical users. Traditional approaches require users to know SQL syntax, but an LLM-powered interface lets analysts query data conversationally, improving both speed and accessibility.
Automated Schema Optimization and Anomaly Detection: AI models can analyze query logs and data distributions to recommend indexing strategies, detect normalization violations, or flag anomalous data patterns. Where a traditional DBA would manually review performance metrics, an AI system continuously monitors and suggests optimizations — catching issues like missing indexes or unusual INSERT patterns that might indicate data quality problems.
Assignment 03: SQL Queries (University Database)
All queries run against the Online SQL Interpreter for the university database.
Question 2: Basic Listings
i. Student IDs (from the takes relation):
SELECT DISTINCT ID
FROM takes;ii. Instructors:
SELECT *
FROM instructor;iii. Departments:
SELECT *
FROM department;Question 3: SQL Queries
i. ID and name of each student who has taken at least one Comp. Sci. course (no duplicates):
SELECT DISTINCT student.ID, student.name
FROM student
JOIN takes ON student.ID = takes.ID
JOIN course ON takes.course_id = course.course_id
WHERE course.dept_name = 'Comp. Sci.';ii. Add grades to the list:
SELECT student.ID, student.name, takes.grade
FROM student
JOIN takes ON student.ID = takes.ID
JOIN course ON takes.course_id = course.course_id
WHERE course.dept_name = 'Comp. Sci.';iii. Students who have not taken any course offered before 2017:
SELECT ID, name
FROM student
WHERE ID NOT IN (
SELECT DISTINCT ID
FROM takes
WHERE year < 2017
);iv. Maximum salary of instructors per department:
SELECT dept_name, MAX(salary) AS max_salary
FROM instructor
GROUP BY dept_name;v. Lowest of the per-department maximum salaries:
SELECT MIN(max_salary)
FROM (
SELECT dept_name, MAX(salary) AS max_salary
FROM instructor
GROUP BY dept_name
);vi. Add names (department with the lowest max salary):
SELECT dept_name, max_salary
FROM (
SELECT dept_name, MAX(salary) AS max_salary
FROM instructor
GROUP BY dept_name
)
WHERE max_salary = (
SELECT MIN(max_salary)
FROM (
SELECT MAX(salary) AS max_salary
FROM instructor
GROUP BY dept_name
)
);Question 5: Number of Students in Each Section
SELECT course_id AS courseid, sec_id AS secid, year, semester,
COUNT(ID) AS num
FROM takes
GROUP BY course_id, sec_id, year, semester;Assignment 04: E-R Diagrams and Advanced SQL
Question 1: Weak vs. Strong Entity Sets
A strong entity set has a primary key that uniquely identifies each entity on its own. A weak entity set cannot be uniquely identified by its own attributes alone — it depends on a related strong (identifying) entity.
Example: Consider a Building entity and a Room entity. Each building has a unique building_id. However, room numbers (like “101”) are only unique within a building — Room 101 in the Science Building is different from Room 101 in the Library. So Room is a weak entity set: its discriminator is room_number, but it requires the building_id from its identifying entity Building to be uniquely identified. The full key is (building_id, room_number).
Question 2: Sports Team E-R Diagram (Soccer)
erDiagram
Team {
int team_id PK
string team_name
string city
int total_wins "derived"
int total_goals_scored "derived"
}
Player {
int player_id PK
string name
string position
int team_id FK
int career_goals "derived"
}
Match {
int match_id PK
date match_date
string venue
int home_team_id FK
int away_team_id FK
int home_score
int away_score
}
Player_Match_Stats {
int player_id FK
int match_id FK
int goals_scored
int assists
int minutes_played
int yellow_cards
int red_cards
}
Team ||--o{ Player : "has"
Team ||--o{ Match : "plays_home"
Team ||--o{ Match : "plays_away"
Player ||--o{ Player_Match_Stats : "records"
Match ||--o{ Player_Match_Stats : "includes"
Derived attributes:
Team.total_wins= COUNT of matches where team’s score > opponent’s scoreTeam.total_goals_scored= SUM of goals from all Player_Match_Stats for team playersPlayer.career_goals= SUM ofgoals_scoredacross all Player_Match_Stats for that player
Question 3a: Natural Join Explanation
Given:
SELECT course_id, semester, year, sec_id, avg(tot_cred)
FROM takes NATURAL JOIN student
WHERE year = 2017
GROUP BY course_id, semester, year, sec_id
HAVING count(ID) >= 2;Appending NATURAL JOIN section would not change the result because the takes relation already contains course_id, sec_id, semester, and year — which are exactly the attributes shared with section. Since every tuple in takes corresponds to an existing section, the natural join with section simply matches on these four attributes without filtering out any rows. No additional columns from section appear in the SELECT or WHERE clauses, so the output is identical.
Question 3b: Students Who Never Took a Course (Outer Join)
SELECT student.ID
FROM student
LEFT OUTER JOIN takes ON student.ID = takes.ID
WHERE takes.ID IS NULL;This left outer join preserves all students, including those with no matching rows in takes. The WHERE takes.ID IS NULL filter isolates students who have never enrolled in any course.
Assignment 05: Advanced E-R Concepts and SQL
Question 1: E-R Diagram as a Graph
a) The graph is disconnected: A disconnected E-R graph means the enterprise schema contains two or more independent groups of entity sets with no relationships connecting them. This indicates the database models separate, unrelated domains — for example, an HR module and an inventory module that share no entities or relationships. In practice, this may suggest the schema should be split into separate databases, or that a missing relationship needs to be modeled.
b) The graph has a cycle: A cycle means there are multiple relationship paths between the same set of entities. This can represent legitimate real-world complexity (e.g., an employee works_in a department and also manages a department, creating a cycle between Employee and Department). However, cycles can also signal redundancy, where the same fact is derivable through different paths. The designer should verify that each relationship in the cycle captures genuinely distinct semantics.
Question 3: Why Do Weak Entity Sets Exist?
While we can convert any weak entity set to a strong one by adding the identifying entity’s key as an attribute, weak entity sets exist for important design reasons:
Semantic clarity: A weak entity explicitly models the real-world dependency. A room does not conceptually exist without its building — making
Roomweak communicates this dependency directly in the schema.Reduced redundancy: If we made
Roomstrong by addingbuilding_idas a regular attribute, the dependency relationship would be implicit rather than enforced by the schema. The weak entity formalism ensures referential integrity is structurally guaranteed.Simpler discriminators: Weak entities can use short, natural discriminators (like
room_number) without needing globally unique artificial keys, keeping the schema closer to how users naturally think about the data.
Question 4a: Employee Database SQL
Schema:
employee(ID, person_name, street, city)
works(ID, company_name, salary)
company(company_name, city)
manages(ID, manager_id)
i. Employees living in the same city as their company:
SELECT e.ID, e.person_name
FROM employee e, works w, company c
WHERE e.ID = w.ID
AND w.company_name = c.company_name
AND e.city = c.city;ii. Employees living in the same city and street as their manager:
SELECT e.ID, e.person_name
FROM employee e, manages m, employee mgr
WHERE e.ID = m.ID
AND m.manager_id = mgr.ID
AND e.city = mgr.city
AND e.street = mgr.street;iii. Employees earning more than the average salary at their company:
SELECT e.ID, e.person_name
FROM employee e, works w
WHERE e.ID = w.ID
AND w.salary > (
SELECT AVG(w2.salary)
FROM works w2
WHERE w2.company_name = w.company_name
);Question 4b: Spring 2017 Natural Join Issue
SELECT name, title
FROM instructor NATURAL JOIN teaches NATURAL JOIN section NATURAL JOIN course
WHERE semester = 'Spring' AND year = 2017The problem is that NATURAL JOIN matches on all shared column names. Both instructor and course have a dept_name attribute, so the natural join chain inadvertently requires that the instructor’s department matches the course’s department. This filters out any instructor teaching a course outside their own department — a valid and common scenario (e.g., a Math professor teaching a Statistics course). The fix is to use explicit JOIN ... ON or JOIN ... USING to control which columns are matched.