Data Architecture Thinking
How to design a database schema before writing any code — the thinking process that separates good systems from ones that break at scale.
Design Before Code — Always
The most common mistake developers make: they open VS Code, start writing components, and figure out the database structure as they go.
This works for a week. Then two features conflict with each other because they assumed incompatible data structures. A column that should have been a separate table is now baked into 50 components. Changing it requires rewriting half the app.
Design your database before writing a single line of code. This is the Data Architect hat — wear it first, wear it for as long as it takes, then put it down and never pick it up mid-feature.
The Three Questions
Before creating any table, answer these three questions:
1. What are the entities? An entity is a noun in your system — a thing that exists independently and has its own properties. Students, Teachers, Classes, Branches, Fee Records, Attendance Records are entities. "Attendance status" is not an entity — it's a property of an Attendance Record.
2. What are their relationships? How do entities relate to each other? Is it one-to-one (one user has one profile), one-to-many (one company has many jobs), or many-to-many (many jobs require many skills)?
The visual below shows how three tables in EduTrack relate to each other through foreign keys — and how a single JOIN query can pull student name, fee amount, and branch city in one round-trip.
Foreign keys are exactly like VLOOKUP across sheets: fee_records.student_id links to students.id, just as =VLOOKUP(student_id, Students!A:C, 2, FALSE) would in Excel.
3. What operations do users need to perform? A recruiter needs to create jobs, see applications, update application stages. A candidate needs to search jobs, apply, track their applications. An admin needs to see everything and approve jobs. These operations tell you what queries you need, which tells you what indexes you need.
Answer these three questions. Then design your tables. Then write code.
Normalization — Don't Repeat Data
Normalization is the practice of organising your data so each piece of information is stored in exactly one place.
The symptom of un-normalised data: you find yourself storing the same value in multiple rows. Company name stored in every job row. City name stored in every user row. When the company changes its name, you have to update 200 job rows and hope you didn't miss any.
The fix: If data appears in multiple rows, it belongs in its own table with a foreign key reference.
clients table with a foreign key. Same principle, different tool.Normalisation principle: each fact should exist in exactly one place in the database.
Common Relationship Patterns
One-to-Many
The most common relationship. One parent, many children.
The foreign key goes on the "many" side. A job row has a company_id. A company row does not have a list of job IDs — that list is derived by querying jobs where company_id matches.
Many-to-Many
When both sides of the relationship can have multiple of the other. Requires a junction table.
The junction table has one row for each relationship pair. To get all skills required by a job, query job_skills WHERE job_id = ? and join to skills_master. To get all jobs requiring a skill, query job_skills WHERE skill_id = ? and join to jobs.
One-to-One
Rare. Used when you want to split a table that is getting too wide, or when some rows have extended data that most rows don't need.
id, created_at, and a unique constraint on the combination of the two foreign keys.The Udyogaseva Schema in Full
Here is the actual architecture of Udyogaseva — a production system for a finance job platform. This is what a well-designed schema looks like.
Every design decision in this schema exists for a reason:
profilesis separate fromcandidatesandrecruitersbecause some data is common (email, role) and some is role-specific (resume URL only makes sense for candidates; company name only makes sense for recruiters).skills_masteris a shared reference table because the same skill (like "Excel") appears in thousands of candidate profiles and job requirements. Storing it once and referencing it everywhere means consistency — "Excel" is always spelled "Excel".applicationslinksjobstocandidateswith a unique constraintUNIQUE(job_id, candidate_id)— a candidate can only apply to a job once.
Designing for EduTrack
EduTrack is the multi-branch school management system you will build as Project 3. Let's design its database architecture using the three-question method.
The Three Questions
Entities: Branches, Academic Years, Classes, Students, Teachers, Attendance Records, Fee Structures, Fee Records, Results, Announcements.
Relationships:
- One branch has many classes and many students
- One class belongs to one branch and one teacher
- One student belongs to one branch and one class
- One attendance record belongs to one student, one class, and one academic year
- One fee structure belongs to one branch and defines what is owed (tuition, transport, etc.)
- One fee record belongs to one student and references one fee structure
- One result belongs to one student and records marks for one subject and one exam
Operations:
- Parent: view their child's attendance, fee records, results, and announcements; pay fees online
- Teacher: mark attendance for their class; enter exam results; view their class roster
- Branch admin: manage students and classes in their branch; view branch-level analytics
- Super admin: see all branches; manage branch admins; view cross-branch analytics
The Schema
The EduTrack schema — designed before a single line of code is written — is covered in full in the EduTrack Architecture page. Every table, column, relationship, and RLS policy is documented there.
The RLS Principle
EduTrack's security model is defined by one rule: data never crosses branch boundaries without super admin access. A branch admin in Pune cannot see a student from Mumbai. A teacher can only see students in their assigned class. A parent can only see their own child's records. Every RLS policy enforces this at the database level — not in the UI.
The Five Anti-Patterns to Avoid
1. Storing lists as comma-separated text — Use a junction table instead. "React,Excel,Python" in a single cell cannot be indexed, searched, or joined efficiently.
2. Storing computed values — If a value can be derived from other columns (e.g. total_fee = base_fee + transport_fee), don't store it. Computed stored values go out of sync with their sources.
3. Using nullable foreign keys to represent exclusive types — A job posting table should not have both candidate_id and recruiter_id where only one is ever filled. Use separate tables or a discriminated type column.
4. Using TEXT for everything — Use BOOLEAN for true/false, INTEGER for counts, NUMERIC(12,2) for money, TIMESTAMPTZ for dates. The right type gives you validation and sorting for free.
5. Skipping soft delete — Once a row is hard-deleted, it is gone. Cascading deletes can silently remove related rows you didn't intend to lose.
Exercise: Design a CA Firm Portal
Design the database tables for a CA firm that wants to:
- Track all their clients
- Track engagements per client (GST filing, audit, ITR, etc.)
- Store documents per engagement
- Track deadlines and their status
- Generate a summary of overdue items
Answer:
- What are the entities?
- What are their relationships?
- What operations do the CA and their staff need?
- What indexes are needed?
- What RLS policies are needed?
Draft your schema on paper. Then compare it with your peers in the session. There is no single correct answer — but the process of reasoning through it is the skill you are building.