Role 3: Data Architect

Designs the database structure before any code is written. The foundation everything else is built on.

The Excel Analogy

Every CA who has designed a proper multi-sheet workbook has done data architecture. This role is building on a skill you already have.

Before you build an Excel model — before you write a single formula — you design the raw data sheet.

You decide: what are the columns? What type of data goes in each column (text, number, date, yes/no)? How is this sheet related to the other sheets? Does the client name live in this sheet, or does it live in a separate sheet and get pulled in via VLOOKUP?

If you design the raw data sheet badly, every formula built on top of it breaks when real data arrives. You add a column to track a new field and now 40 VLOOKUPs reference the wrong column. You store client names as "Last, First" and then realize you need to sort alphabetically by last name — now you have to rewrite everything.

A good data model is designed once, carefully, before anything else. Then everything else is built on top of that stable foundation.

In software, this is the Data Architect. They design the database — the tables, the columns, how they connect — before a single line of application code is written.

Reflect — Think of a workbook you built that you later had to significantly restructure because the original sheet design didn't support what was needed. That pain — and the lesson it taught you — is exactly why the Data Architect role comes first.

What This Role Decides

Tables — what are the main "things" in your application?

Every meaningful noun in your app becomes a table. A school management app has: Branches, Students, Teachers, Classes, Attendance records, Fee records. Each noun gets its own table.

Columns — what do you need to know about each thing?

An Attendance record has: which student it is for, which branch and class, what date, what the status is (present / absent / late), which teacher marked it, and when it was recorded.

Relationships — how are the tables connected to each other?

An Attendance record belongs to one Student. A Student belongs to one Branch. A Student is assigned to one Class. A Class has one Teacher. These connections are called foreign keys — one table pointing to a record in another table.

Data types — what kind of value goes in each column?

A phone number is text (not a number — you never do arithmetic on it). A price is a number with decimals. A "has verified email" field is true/false. A payment date is a timestamp. Getting types wrong causes subtle bugs that are expensive to fix later.

Security rules — who can see and change what?

Can a parent see another student's fee records? No. Can a teacher see students from another branch? No — only their assigned class. Can a branch admin in Pune see data from the Mumbai branch? Absolutely not. These rules (called Row Level Security in Supabase) are part of the data architecture, not an afterthought.

Supabase dashboard sign-in page Supabase dashboard — as Data Architect, this is where you design tables, define columns, and set the rules for what data is allowed.

Why This Role Comes FIRST

The Data Architect role must come before the Backend Developer role and well before the Frontend Developer role. This is not a preference. It is a structural requirement.

Here is what happens when you skip it:

You start building the student enrollment screen (Frontend). You make a form. The form has fields. You connect those fields to a database table. Three weeks later, the school says "we need to support families with two children — the second child gets a 10% fee discount." Your current fee_records table stores one amount per student with no concept of sibling relationships. Changing this requires rebuilding the fee table, the backend calculation logic, and the frontend screens that depend on it.

If you had spent one hour thinking about this in Data Architect mode before writing any code, you would have built the table to support it from the start — or consciously decided that sibling discounts are out of scope for v1.

The database is the foundation. Every other layer is built on top of it. Getting the foundation wrong means rebuilding everything.

The most common and most expensive mistake in solo development: starting to build screens before the database design is stable. It feels faster to "just start coding" — but the time you save in week 1 is paid back tenfold in week 3 when you are refactoring tables and rewriting queries. Do the data design first. Always.

Concrete Examples — Good vs Bad Design

Scenario: You are building EduTrack. You need to store a student's fee record.

Approach	What it looks like	Problem
Bad	Store fee as one text field: "June 2025 tuition and transport, ₹3500, paid on 5th"	You can never filter by fee type. You can never separate tuition from transport. You can never calculate the total outstanding automatically.
Better	Separate columns: `fee_type`, `amount`, `due_date`, `paid_at`, `status`	Now you can filter by status (overdue), group by fee type, and calculate totals accurately.
Best	A `fee_structures` table linked to grades, and a `fee_records` table linked to students. The structure defines what is owed; the record tracks what was paid.	When fee amounts change next year, you update the structure once and all new records use the new amount. Old paid records retain their original amounts.

Three different data designs for the same information. Each one enables completely different features — and makes others impossible.

The CA / Excel Translation

Database concept	Excel equivalent
Table	A sheet in your workbook
Row	One row of data in that sheet
Column	A column header (field name)
Primary key (unique ID for each row)	The invoice number — every row has one, no two rows share it
Foreign key (pointer from one table to another)	A VLOOKUP — "look up this customer's name from the Customers sheet"
Data type (text, number, date, boolean)	Cell format (General, Number, Date, Text)
Row Level Security	Sheet protection + user permissions — who can see and edit which rows
Index	Sorting your raw data sheet so lookups are faster

When You Are in This Role

Put on the Data Architect hat:

Immediately after the Product Owner has defined what is being built
Before any code is written — always
When a new feature requires storing new information that does not fit the existing tables
When you realize your current database design cannot support a feature the Product Owner wants
When you are planning what columns a new table should have
When you are deciding the relationship between two tables

Common Mistakes When You Skip This Role

Storing derived data instead of source data. You store total_fees_paid as a column — because you need to show it on the dashboard. But when a new payment comes in, you now have to remember to update two places: the fee_records row AND the total_fees_paid summary. They will eventually get out of sync. Better: store individual payment records, calculate the total with a query. The Data Architect thinks about this.

No audit trail. You do not design a created_at or updated_at column. Three months later, a parent disputes a late fee and asks "when was this fee record created?" You have no idea. These timestamps cost nothing to add — but only if you add them at design time.

Ambiguous column names. A column called status in a fee_records table. Status of what? The payment status (pending / paid / overdue)? The dispute status (none / disputed / resolved)? A Data Architect names columns clearly: payment_status, dispute_status.

Storing lists in a single column. A student can be enrolled in multiple extracurricular subjects: "chess, music, sports." You store this as one text column. Now you can never query "all students enrolled in music" without an unreliable text search. The correct design is a separate student_activities table — one row per student per activity.

No soft delete. You hard-delete a student record when they leave the school. Three months later, the accounts department asks "what fees did this student pay before leaving?" You cannot answer. It is gone. The Data Architect designs deleted_at — a column that records when something was deactivated, without removing the row or any of its related records.

Reflect — In your CA work, you would never permanently destroy a financial record that might be needed for a future audit. The same principle applies to database design: soft delete (marking records as inactive while keeping the data) is the default. Hard deletion is the exception, not the rule.

How Claude Code Helps in This Role

Propose a schema for a feature:

"I am building a school management app for a school with multiple branches. Branch admins manage students, teachers mark daily attendance, parents pay fees online, and the head office sees cross-branch analytics. Design the database tables for this system. Include all columns, data types, relationships, and explain your reasoning."

Review and improve your proposed schema:

"Here is the schema I designed for the fee management module: [paste your schema]. What is missing? What will cause problems later? What would you change and why?"

Generate Supabase migration SQL:

"Generate the Supabase SQL migration to create these tables: [your schema]. Include: primary keys, foreign keys, timestamps (created_at, updated_at), and soft delete (deleted_at)."

Design RLS policies:

"For the students table, write Supabase Row Level Security policies so that: branch admins can only see students in their own branch, teachers can only see students in their assigned class, and parents can only see their own child's record."

What to watch for in Claude's output:

Does the schema support all the features the Product Owner defined?
Are there features you know will be needed in v2 that the schema makes impossible?
Are naming conventions consistent? (snake_case, clear English names)
Does every table have a primary key, created_at, and updated_at?

How to Switch Into This Role

Before designing any data structure, say:

"I am not building screens right now. I am designing the container that will hold everything this app knows. I need to get this right before anyone writes a single line of React."

Then ask these questions:

What are all the "things" (nouns) in this application? Each one may be a table.
What do I need to know about each thing? Each answer is a column.
How are the things related to each other? Each relationship is a foreign key.
Who should be allowed to see and change each thing? Each answer is an RLS rule.
What will this application need to do in v2 that the data model should not make impossible?

Draw it out. Tables as boxes. Columns listed inside. Lines between boxes showing relationships. This is called an ERD (Entity Relationship Diagram) and it is the single most useful document you can create before writing any code.

Exercise

You are building a simple CA firm website where prospective clients can submit an inquiry form. The firm partners want to log in and manage these inquiries — mark them as contacted, add notes, assign to a team member, and track the status.

What tables do you need?

What columns does each table need?

How are the tables related to each other?

Who can see what? (client who submitted the form, team members, partners)

Write it out with pen and paper. Do not look at any code or use Claude yet — just think.

Then ask Claude: "Review my database design for a CA firm inquiry management system: [your design]. What did I miss? What will cause problems later?"

Compare Claude's feedback to what you designed. You will learn more from the gap than from starting with Claude's answer.

Your Turn — Before asking Claude, check your design against two questions: Does every table have a created_at column? Does every table that might need recovery have a deleted_at column? If not, add them now. These are the two most common omissions from first-time data designs.

The Excel Analogy

Every CA who has designed a proper multi-sheet workbook has done data architecture. This role is building on a skill you already have.

Before you build an Excel model — before you write a single formula — you design the raw data sheet.

A good data model is designed once, carefully, before anything else. Then everything else is built on top of that stable foundation.

In software, this is the Data Architect. They design the database — the tables, the columns, how they connect — before a single line of application code is written.

What This Role Decides

Tables — what are the main "things" in your application?

Every meaningful noun in your app becomes a table. A school management app has: Branches, Students, Teachers, Classes, Attendance records, Fee records. Each noun gets its own table.

Columns — what do you need to know about each thing?

An Attendance record has: which student it is for, which branch and class, what date, what the status is (present / absent / late), which teacher marked it, and when it was recorded.

Relationships — how are the tables connected to each other?

Data types — what kind of value goes in each column?

Security rules — who can see and change what?

Supabase dashboard sign-in page Supabase dashboard — as Data Architect, this is where you design tables, define columns, and set the rules for what data is allowed.

Why This Role Comes FIRST

The Data Architect role must come before the Backend Developer role and well before the Frontend Developer role. This is not a preference. It is a structural requirement.

Here is what happens when you skip it:

The database is the foundation. Every other layer is built on top of it. Getting the foundation wrong means rebuilding everything.

Concrete Examples — Good vs Bad Design

Scenario: You are building EduTrack. You need to store a student's fee record.

Approach	What it looks like	Problem
Bad	Store fee as one text field: "June 2025 tuition and transport, ₹3500, paid on 5th"	You can never filter by fee type. You can never separate tuition from transport. You can never calculate the total outstanding automatically.
Better	Separate columns: `fee_type`, `amount`, `due_date`, `paid_at`, `status`	Now you can filter by status (overdue), group by fee type, and calculate totals accurately.
Best	A `fee_structures` table linked to grades, and a `fee_records` table linked to students. The structure defines what is owed; the record tracks what was paid.	When fee amounts change next year, you update the structure once and all new records use the new amount. Old paid records retain their original amounts.

Three different data designs for the same information. Each one enables completely different features — and makes others impossible.

The CA / Excel Translation

Database concept	Excel equivalent
Table	A sheet in your workbook
Row	One row of data in that sheet
Column	A column header (field name)
Primary key (unique ID for each row)	The invoice number — every row has one, no two rows share it
Foreign key (pointer from one table to another)	A VLOOKUP — "look up this customer's name from the Customers sheet"
Data type (text, number, date, boolean)	Cell format (General, Number, Date, Text)
Row Level Security	Sheet protection + user permissions — who can see and edit which rows
Index	Sorting your raw data sheet so lookups are faster

When You Are in This Role

Put on the Data Architect hat:

Immediately after the Product Owner has defined what is being built
Before any code is written — always
When a new feature requires storing new information that does not fit the existing tables
When you realize your current database design cannot support a feature the Product Owner wants
When you are planning what columns a new table should have
When you are deciding the relationship between two tables

Common Mistakes When You Skip This Role

How Claude Code Helps in This Role

Propose a schema for a feature:

"I am building a school management app for a school with multiple branches. Branch admins manage students, teachers mark daily attendance, parents pay fees online, and the head office sees cross-branch analytics. Design the database tables for this system. Include all columns, data types, relationships, and explain your reasoning."

Review and improve your proposed schema:

"Here is the schema I designed for the fee management module: [paste your schema]. What is missing? What will cause problems later? What would you change and why?"

Generate Supabase migration SQL:

"Generate the Supabase SQL migration to create these tables: [your schema]. Include: primary keys, foreign keys, timestamps (created_at, updated_at), and soft delete (deleted_at)."

Design RLS policies:

"For the students table, write Supabase Row Level Security policies so that: branch admins can only see students in their own branch, teachers can only see students in their assigned class, and parents can only see their own child's record."

What to watch for in Claude's output:

Does the schema support all the features the Product Owner defined?
Are there features you know will be needed in v2 that the schema makes impossible?
Are naming conventions consistent? (snake_case, clear English names)
Does every table have a primary key, created_at, and updated_at?

How to Switch Into This Role

Before designing any data structure, say:

"I am not building screens right now. I am designing the container that will hold everything this app knows. I need to get this right before anyone writes a single line of React."

Then ask these questions:

What are all the "things" (nouns) in this application? Each one may be a table.
What do I need to know about each thing? Each answer is a column.
How are the things related to each other? Each relationship is a foreign key.
Who should be allowed to see and change each thing? Each answer is an RLS rule.
What will this application need to do in v2 that the data model should not make impossible?

Exercise

What tables do you need?

What columns does each table need?

How are the tables related to each other?

Who can see what? (client who submitted the form, team members, partners)

Write it out with pen and paper. Do not look at any code or use Claude yet — just think.

Then ask Claude: "Review my database design for a CA firm inquiry management system: [your design]. What did I miss? What will cause problems later?"

Compare Claude's feedback to what you designed. You will learn more from the gap than from starting with Claude's answer.

Role 3: Data Architect

The Excel Analogy

What This Role Decides

Why This Role Comes FIRST

Concrete Examples — Good vs Bad Design

The CA / Excel Translation

When You Are in This Role

Common Mistakes When You Skip This Role

How Claude Code Helps in This Role

How to Switch Into This Role

Exercise

On this page

Role 3: Data Architect

The Excel Analogy

What This Role Decides

Why This Role Comes FIRST

Concrete Examples — Good vs Bad Design

The CA / Excel Translation

When You Are in This Role

Common Mistakes When You Skip This Role

How Claude Code Helps in This Role

How to Switch Into This Role

Exercise

On this page