Data Architecture Thinking

How to design a database schema before writing any code — the thinking process that separates good systems from ones that break at scale.

Design Before Code — Always

The most common mistake developers make: they open VS Code, start writing components, and figure out the database structure as they go.

This works for a week. Then two features conflict with each other because they assumed incompatible data structures. A column that should have been a separate table is now baked into 50 components. Changing it requires rewriting half the app.

Design your database before writing a single line of code. This is the Data Architect hat — wear it first, wear it for as long as it takes, then put it down and never pick it up mid-feature.

Your Turn — Before you continue this page, take 10 minutes and sketch the data architecture for your project on paper. Use boxes for tables and arrows for foreign key relationships. The act of drawing it will reveal problems you cannot see when thinking about it in the abstract.

The Three Questions

Before creating any table, answer these three questions:

1. What are the entities? An entity is a noun in your system — a thing that exists independently and has its own properties. Students, Teachers, Classes, Branches, Fee Records, Attendance Records are entities. "Attendance status" is not an entity — it's a property of an Attendance Record.

2. What are their relationships? How do entities relate to each other? Is it one-to-one (one user has one profile), one-to-many (one company has many jobs), or many-to-many (many jobs require many skills)?

The visual below shows how three tables in Aurora Public School relate to each other through foreign keys — and how a single JOIN query can pull student name, fee amount, and branch city in one round-trip.

Three related tables — students, fee_records, branches — joined by foreign keys Foreign keys are exactly like VLOOKUP across sheets: fee_records.student_id links to students.id, just as =VLOOKUP(student_id, Students!A:C, 2, FALSE) would in Excel.

3. What operations do users need to perform? A recruiter needs to create jobs, see applications, update application stages. A candidate needs to search jobs, apply, track their applications. An admin needs to see everything and approve jobs. These operations tell you what queries you need, which tells you what indexes you need.

Answer these three questions. Then design your tables. Then write code.

Spreadsheet → DatabaseTry in Playground →|New tab

Normalization — Don't Repeat Data

Normalization is the practice of organising your data so each piece of information is stored in exactly one place.

The symptom of un-normalised data: you find yourself storing the same value in multiple rows. Company name stored in every job row. City name stored in every user row. When the company changes its name, you have to update 200 job rows and hope you didn't miss any.

The fix: If data appears in multiple rows, it belongs in its own table with a foreign key reference.

Think of it like this — Normalization is the database equivalent of not repeating data across Excel cells. If a client's name appears in 50 rows of your engagements sheet, you've already violated this rule. The fix in Excel would be a clients sheet with a VLOOKUP. The fix in a database is a clients table with a foreign key. Same principle, different tool.

Normalisation principle: each fact should exist in exactly one place in the database.

Entity RelationshipsTry in Playground →|New tab

Common Relationship Patterns

One-to-Many

The most common relationship. One parent, many children.

The foreign key goes on the "many" side. A job row has a company_id. A company row does not have a list of job IDs — that list is derived by querying jobs where company_id matches.

Many-to-Many

When both sides of the relationship can have multiple of the other. Requires a junction table.

The junction table has one row for each relationship pair. To get all skills required by a job, query job_skills WHERE job_id = ? and join to skills_master. To get all jobs requiring a skill, query job_skills WHERE skill_id = ? and join to jobs.

One-to-One

Rare. Used when you want to split a table that is getting too wide, or when some rows have extended data that most rows don't need.

Your Turn — Look at the data architecture you drew before reading this page. Classify every relationship as one-to-many, many-to-many, or one-to-one. For every many-to-many, draw a junction table between the two entities. If you have any junction tables, they each need their own id, created_at, and a unique constraint on the combination of the two foreign keys.

The Aurora Public School Schema in Full

Here is the actual architecture of Aurora Public School — a production system for a finance job platform. This is what a well-designed schema looks like.

Every design decision in this schema exists for a reason:

profiles is separate from candidates and recruiters because some data is common (email, role) and some is role-specific (resume URL only makes sense for candidates; company name only makes sense for recruiters).
skills_master is a shared reference table because the same skill (like "Excel") appears in thousands of candidate profiles and job requirements. Storing it once and referencing it everywhere means consistency — "Excel" is always spelled "Excel".
applications links jobs to candidates with a unique constraint UNIQUE(job_id, candidate_id) — a candidate can only apply to a job once.

Designing for Aurora Public School

Aurora Public School is the multi-branch school management system you will build as Project 3. Let's design its database architecture using the three-question method.

The Three Questions

Entities: Branches, Academic Years, Classes, Students, Teachers, Attendance Records, Fee Structures, Fee Records, Results, Announcements.

Relationships:

One branch has many classes and many students
One class belongs to one branch and one teacher
One student belongs to one branch and one class
One attendance record belongs to one student, one class, and one academic year
One fee structure belongs to one branch and defines what is owed (tuition, transport, etc.)
One fee record belongs to one student and references one fee structure
One result belongs to one student and records marks for one subject and one exam

Operations:

Parent: view their child's attendance, fee records, results, and announcements; pay fees online
Teacher: mark attendance for their class; enter exam results; view their class roster
Branch admin: manage students and classes in their branch; view branch-level analytics
Super admin: see all branches; manage branch admins; view cross-branch analytics

The Schema

The Aurora Public School schema — designed before a single line of code is written — is covered in full in the Aurora Public School Architecture page. Every table, column, relationship, and RLS policy is documented there.

The RLS Principle

Aurora Public School's security model is defined by one rule: data never crosses branch boundaries without super admin access. A branch admin in Pune cannot see a student from Mumbai. A teacher can only see students in their assigned class. A parent can only see their own child's records. Every RLS policy enforces this at the database level — not in the UI.

Never disable RLS. A Supabase table with RLS disabled is publicly readable and writable by anyone with your project URL. In Aurora Public School, that means any parent could read every student's fee records and attendance. Always keep RLS enabled and define explicit branch-scoped policies.

The Five Anti-Patterns to Avoid

1. Storing lists as comma-separated text — Use a junction table instead. "React,Excel,Python" in a single cell cannot be indexed, searched, or joined efficiently.

2. Storing computed values — If a value can be derived from other columns (e.g. total_fee = base_fee + transport_fee), don't store it. Computed stored values go out of sync with their sources.

3. Using nullable foreign keys to represent exclusive types — A job posting table should not have both candidate_id and recruiter_id where only one is ever filled. Use separate tables or a discriminated type column.

4. Using TEXT for everything — Use BOOLEAN for true/false, INTEGER for counts, NUMERIC(12,2) for money, TIMESTAMPTZ for dates. The right type gives you validation and sorting for free.

5. Skipping soft delete — Once a row is hard-deleted, it is gone. Cascading deletes can silently remove related rows you didn't intend to lose.

Your Turn — Review the schema you designed earlier. Check it against each of the five anti-patterns. Do any columns store comma-separated lists? Are there computed values that could go out of sync? Are there nullable foreign keys that represent an exclusive choice? Fix each one before writing any code.

Exercise: Design a CA Firm Portal

Design the database tables for a CA firm that wants to:

Track all their clients
Track engagements per client (GST filing, audit, ITR, etc.)
Store documents per engagement
Track deadlines and their status
Generate a summary of overdue items

Answer:

What are the entities?
What are their relationships?
What operations do the CA and their staff need?
What indexes are needed?
What RLS policies are needed?

Draft your schema on paper. Then compare it with your peers in the session. There is no single correct answer — but the process of reasoning through it is the skill you are building.

Design Before Code — Always

The most common mistake developers make: they open VS Code, start writing components, and figure out the database structure as they go.

The Three Questions

Before creating any table, answer these three questions:

Answer these three questions. Then design your tables. Then write code.

Spreadsheet → DatabaseTry in Playground →|New tab

Normalization — Don't Repeat Data

Normalization is the practice of organising your data so each piece of information is stored in exactly one place.

The fix: If data appears in multiple rows, it belongs in its own table with a foreign key reference.

Normalisation principle: each fact should exist in exactly one place in the database.

Entity RelationshipsTry in Playground →|New tab

Common Relationship Patterns

One-to-Many

The most common relationship. One parent, many children.

The foreign key goes on the "many" side. A job row has a company_id. A company row does not have a list of job IDs — that list is derived by querying jobs where company_id matches.

Many-to-Many

When both sides of the relationship can have multiple of the other. Requires a junction table.

One-to-One

Rare. Used when you want to split a table that is getting too wide, or when some rows have extended data that most rows don't need.

The Aurora Public School Schema in Full

Here is the actual architecture of Aurora Public School — a production system for a finance job platform. This is what a well-designed schema looks like.

Every design decision in this schema exists for a reason:

profiles is separate from candidates and recruiters because some data is common (email, role) and some is role-specific (resume URL only makes sense for candidates; company name only makes sense for recruiters).
skills_master is a shared reference table because the same skill (like "Excel") appears in thousands of candidate profiles and job requirements. Storing it once and referencing it everywhere means consistency — "Excel" is always spelled "Excel".
applications links jobs to candidates with a unique constraint UNIQUE(job_id, candidate_id) — a candidate can only apply to a job once.

Designing for Aurora Public School

Aurora Public School is the multi-branch school management system you will build as Project 3. Let's design its database architecture using the three-question method.

The Three Questions

Entities: Branches, Academic Years, Classes, Students, Teachers, Attendance Records, Fee Structures, Fee Records, Results, Announcements.

Relationships:

One branch has many classes and many students
One class belongs to one branch and one teacher
One student belongs to one branch and one class
One attendance record belongs to one student, one class, and one academic year
One fee structure belongs to one branch and defines what is owed (tuition, transport, etc.)
One fee record belongs to one student and references one fee structure
One result belongs to one student and records marks for one subject and one exam

Operations:

Parent: view their child's attendance, fee records, results, and announcements; pay fees online
Teacher: mark attendance for their class; enter exam results; view their class roster
Branch admin: manage students and classes in their branch; view branch-level analytics
Super admin: see all branches; manage branch admins; view cross-branch analytics

Track all their clients
Track engagements per client (GST filing, audit, ITR, etc.)
Store documents per engagement
Track deadlines and their status
Generate a summary of overdue items

Answer:

What are the entities?
What are their relationships?
What operations do the CA and their staff need?
What indexes are needed?
What RLS policies are needed?

Draft your schema on paper. Then compare it with your peers in the session. There is no single correct answer — but the process of reasoning through it is the skill you are building.

Data Architecture Thinking

Design Before Code — Always

The Three Questions

Normalization — Don't Repeat Data

Common Relationship Patterns

One-to-Many

Many-to-Many

One-to-One

The Aurora Public School Schema in Full

Designing for Aurora Public School

The Three Questions

The Schema

The RLS Principle

The Five Anti-Patterns to Avoid

Exercise: Design a CA Firm Portal

On this page

Data Architecture Thinking

Design Before Code — Always

The Three Questions

Normalization — Don't Repeat Data

Common Relationship Patterns

One-to-Many

Many-to-Many

One-to-One

The Aurora Public School Schema in Full

Designing for Aurora Public School

The Three Questions

The Schema

The RLS Principle

The Five Anti-Patterns to Avoid

Exercise: Design a CA Firm Portal

On this page