Role 3: Data Architect
Designs the database structure before any code is written. The foundation everything else is built on.
The Excel Analogy
Every CA who has designed a proper multi-sheet workbook has done data architecture. This role is building on a skill you already have.
Before you build an Excel model — before you write a single formula — you design the raw data sheet.
You decide: what are the columns? What type of data goes in each column (text, number, date, yes/no)? How is this sheet related to the other sheets? Does the client name live in this sheet, or does it live in a separate sheet and get pulled in via VLOOKUP?
If you design the raw data sheet badly, every formula built on top of it breaks when real data arrives. You add a column to track a new field and now 40 VLOOKUPs reference the wrong column. You store client names as "Last, First" and then realize you need to sort alphabetically by last name — now you have to rewrite everything.
A good data model is designed once, carefully, before anything else. Then everything else is built on top of that stable foundation.
In software, this is the Data Architect. They design the database — the tables, the columns, how they connect — before a single line of application code is written.
What This Role Decides
Tables — what are the main "things" in your application?
Every meaningful noun in your app becomes a table. A school management app has: Branches, Students, Teachers, Classes, Attendance records, Fee records. Each noun gets its own table.
Columns — what do you need to know about each thing?
An Attendance record has: which student it is for, which branch and class, what date, what the status is (present / absent / late), which teacher marked it, and when it was recorded.
Relationships — how are the tables connected to each other?
An Attendance record belongs to one Student. A Student belongs to one Branch. A Student is assigned to one Class. A Class has one Teacher. These connections are called foreign keys — one table pointing to a record in another table.
Data types — what kind of value goes in each column?
A phone number is text (not a number — you never do arithmetic on it). A price is a number with decimals. A "has verified email" field is true/false. A payment date is a timestamp. Getting types wrong causes subtle bugs that are expensive to fix later.
Security rules — who can see and change what?
Can a parent see another student's fee records? No. Can a teacher see students from another branch? No — only their assigned class. Can a branch admin in Pune see data from the Mumbai branch? Absolutely not. These rules (called Row Level Security in Supabase) are part of the data architecture, not an afterthought.
Supabase dashboard — as Data Architect, this is where you design tables, define columns, and set the rules for what data is allowed.
Why This Role Comes FIRST
The Data Architect role must come before the Backend Developer role and well before the Frontend Developer role. This is not a preference. It is a structural requirement.
Here is what happens when you skip it:
You start building the student enrollment screen (Frontend). You make a form. The form has fields. You connect those fields to a database table. Three weeks later, the school says "we need to support families with two children — the second child gets a 10% fee discount." Your current fee_records table stores one amount per student with no concept of sibling relationships. Changing this requires rebuilding the fee table, the backend calculation logic, and the frontend screens that depend on it.
If you had spent one hour thinking about this in Data Architect mode before writing any code, you would have built the table to support it from the start — or consciously decided that sibling discounts are out of scope for v1.
The database is the foundation. Every other layer is built on top of it. Getting the foundation wrong means rebuilding everything.
The most common and most expensive mistake in solo development: starting to build screens before the database design is stable. It feels faster to "just start coding" — but the time you save in week 1 is paid back tenfold in week 3 when you are refactoring tables and rewriting queries. Do the data design first. Always.
Concrete Examples — Good vs Bad Design
Scenario: You are building EduTrack. You need to store a student's fee record.
| Approach | What it looks like | Problem |
|---|---|---|
| Bad | Store fee as one text field: "June 2025 tuition and transport, ₹3500, paid on 5th" | You can never filter by fee type. You can never separate tuition from transport. You can never calculate the total outstanding automatically. |
| Better | Separate columns: fee_type, amount, due_date, paid_at, status | Now you can filter by status (overdue), group by fee type, and calculate totals accurately. |
| Best | A fee_structures table linked to grades, and a fee_records table linked to students. The structure defines what is owed; the record tracks what was paid. | When fee amounts change next year, you update the structure once and all new records use the new amount. Old paid records retain their original amounts. |
Three different data designs for the same information. Each one enables completely different features — and makes others impossible.
The CA / Excel Translation
| Database concept | Excel equivalent |
|---|---|
| Table | A sheet in your workbook |
| Row | One row of data in that sheet |
| Column | A column header (field name) |
| Primary key (unique ID for each row) | The invoice number — every row has one, no two rows share it |
| Foreign key (pointer from one table to another) | A VLOOKUP — "look up this customer's name from the Customers sheet" |
| Data type (text, number, date, boolean) | Cell format (General, Number, Date, Text) |
| Row Level Security | Sheet protection + user permissions — who can see and edit which rows |
| Index | Sorting your raw data sheet so lookups are faster |
When You Are in This Role
Put on the Data Architect hat:
- Immediately after the Product Owner has defined what is being built
- Before any code is written — always
- When a new feature requires storing new information that does not fit the existing tables
- When you realize your current database design cannot support a feature the Product Owner wants
- When you are planning what columns a new table should have
- When you are deciding the relationship between two tables
Common Mistakes When You Skip This Role
Storing derived data instead of source data. You store total_fees_paid as a column — because you need to show it on the dashboard. But when a new payment comes in, you now have to remember to update two places: the fee_records row AND the total_fees_paid summary. They will eventually get out of sync. Better: store individual payment records, calculate the total with a query. The Data Architect thinks about this.
No audit trail. You do not design a created_at or updated_at column. Three months later, a parent disputes a late fee and asks "when was this fee record created?" You have no idea. These timestamps cost nothing to add — but only if you add them at design time.
Ambiguous column names. A column called status in a fee_records table. Status of what? The payment status (pending / paid / overdue)? The dispute status (none / disputed / resolved)? A Data Architect names columns clearly: payment_status, dispute_status.
Storing lists in a single column. A student can be enrolled in multiple extracurricular subjects: "chess, music, sports." You store this as one text column. Now you can never query "all students enrolled in music" without an unreliable text search. The correct design is a separate student_activities table — one row per student per activity.
No soft delete. You hard-delete a student record when they leave the school. Three months later, the accounts department asks "what fees did this student pay before leaving?" You cannot answer. It is gone. The Data Architect designs deleted_at — a column that records when something was deactivated, without removing the row or any of its related records.
How Claude Code Helps in This Role
Propose a schema for a feature:
"I am building a school management app for a school with multiple branches. Branch admins manage students, teachers mark daily attendance, parents pay fees online, and the head office sees cross-branch analytics. Design the database tables for this system. Include all columns, data types, relationships, and explain your reasoning."
Review and improve your proposed schema:
"Here is the schema I designed for the fee management module: [paste your schema]. What is missing? What will cause problems later? What would you change and why?"
Generate Supabase migration SQL:
"Generate the Supabase SQL migration to create these tables: [your schema]. Include: primary keys, foreign keys, timestamps (created_at, updated_at), and soft delete (deleted_at)."
Design RLS policies:
"For the students table, write Supabase Row Level Security policies so that: branch admins can only see students in their own branch, teachers can only see students in their assigned class, and parents can only see their own child's record."
What to watch for in Claude's output:
- Does the schema support all the features the Product Owner defined?
- Are there features you know will be needed in v2 that the schema makes impossible?
- Are naming conventions consistent? (snake_case, clear English names)
- Does every table have a primary key, created_at, and updated_at?
How to Switch Into This Role
Before designing any data structure, say:
"I am not building screens right now. I am designing the container that will hold everything this app knows. I need to get this right before anyone writes a single line of React."
Then ask these questions:
- What are all the "things" (nouns) in this application? Each one may be a table.
- What do I need to know about each thing? Each answer is a column.
- How are the things related to each other? Each relationship is a foreign key.
- Who should be allowed to see and change each thing? Each answer is an RLS rule.
- What will this application need to do in v2 that the data model should not make impossible?
Draw it out. Tables as boxes. Columns listed inside. Lines between boxes showing relationships. This is called an ERD (Entity Relationship Diagram) and it is the single most useful document you can create before writing any code.
Exercise
You are building a simple CA firm website where prospective clients can submit an inquiry form. The firm partners want to log in and manage these inquiries — mark them as contacted, add notes, assign to a team member, and track the status.
Write it out with pen and paper. Do not look at any code or use Claude yet — just think.
Then ask Claude: "Review my database design for a CA firm inquiry management system: [your design]. What did I miss? What will cause problems later?"
Compare Claude's feedback to what you designed. You will learn more from the gap than from starting with Claude's answer.
created_at column? Does every table that might need recovery have a deleted_at column? If not, add them now. These are the two most common omissions from first-time data designs.