Data Engineer

Full repo 👉 GitHub

How I Built a Contractor CVS Data Pipeline Using Flask, PostgreSQL, and Python

Created by Lubo Bali

Created by Lubo Bali

Full pipeline flow – from data ingestion to dashboard output

This project isn’t just a demo — it’s a fully automated backend pipeline I built from scratch to simulate how a company like CVS might manage contractor data. It showcases the full data engineering lifecycle from ingestion to insight.

I developed a Flask API with multiple endpoints to simulate live contractor and work order data flowing into the system.
Data is stored in a realistic PostgreSQL schema, with records inserted and managed via Python scripts.
A Python automation extracts, cleans, and exports daily snapshots of contractor data into CSV files without manual intervention.
These exports connect directly to a Tableau dashboard that updates dynamically to visualize contractor performance in real time.
The entire pipeline runs locally, handling data freshness and file management, and is structured for easy extension into scalable, cloud-based real-time systems.

This project demonstrates my ability to design, build, and maintain reliable, end-to-end data pipelines that turn raw data into actionable business insights seamlessly.

Full pipeline flow – from data ingestion to dashboard output

I developed a Flask API with multiple endpoints to simulate live contractor and work order data flowing into the system.
Data is stored in a realistic PostgreSQL schema, with records inserted and managed via Python scripts.
A Python automation extracts, cleans, and exports daily snapshots of contractor data into CSV files without manual intervention.
These exports connect directly to a Tableau dashboard that updates dynamically to visualize contractor performance in real time.
The entire pipeline runs locally, handling data freshness and file management, and is structured for easy extension into scalable, cloud-based real-time systems.

This project demonstrates my ability to design, build, and maintain reliable, end-to-end data pipelines that turn raw data into actionable business insights seamlessly.

All the core logic for this pipeline — including the Flask API, data insertion script, and export automation — is available in the GitHub repository below:

🔗 View Full Code on GitHub

This includes:

app.py – Flask API with endpoints
insert_data.py – Inserts mock contractor data into PostgreSQL
export_to_csv.py – Exports daily contractor snapshots to CSV

The code is modular, clean, and easy to extend or integrate into any real-world backend.

All the core logic for this pipeline — including the Flask API, data insertion script, and export automation — is available in the GitHub repository below:

🔗 View Full Code on GitHub

This includes:

app.py – Flask API with endpoints
insert_data.py – Inserts mock contractor data into PostgreSQL
export_to_csv.py – Exports daily contractor snapshots to CSV

The code is modular, clean, and easy to extend or integrate into any real-world backend.

To The final step in the pipeline connects the exported contractor data to a live Tableau dashboard.

The CSV file generated in Step 3 (contractors_export.csv) is used as the data source for visualizing contractor performance metrics like:

Average cost per contractor
Response times
Overdue orders

This demonstrates how structured backend data can be delivered in a BI-ready format, closing the loop from ingestion to insight.

Step 4: Dashboard Connection

To The final step in the pipeline connects the exported contractor data to a live Tableau dashboard.

The CSV file generated in Step 3 (contractors_export.csv) is used as the data source for visualizing contractor performance metrics like:

Average cost per contractor
Response times
Overdue orders

This demonstrates how structured backend data can be delivered in a BI-ready format, closing the loop from ingestion to insight.

Step 4: Dashboard Connection

# pipeline/export_to_csv.py

cur.execute("SELECT * FROM contractors")
rows = cur.fetchall()

with open("pipeline/contractors_export.csv", "w", newline="") as file:
    writer = csv.writer(file)
    writer.writerow(["name", "avg_cost", "overdue_orders", "response_time"])
    writer.writerows(rows)

Step 3: Scheduled Data Export

To make the data usable for business intelligence tools like Tableau, I created a Python script that automatically exports fresh snapshots of contractor data into CSV format.

The script reads from the PostgreSQL contractors table and writes the result to a CSV file that’s updated on a daily basis. This simulates the type of scheduled export jobs that companies like CVS rely on for reporting and analysis.

# pipeline/export_to_csv.py

cur.execute("SELECT * FROM contractors")
rows = cur.fetchall()

with open("pipeline/contractors_export.csv", "w", newline="") as file:
    writer = csv.writer(file)
    writer.writerow(["name", "avg_cost", "overdue_orders", "response_time"])
    writer.writerows(rows)

Step 3: Scheduled Data Export

To make the data usable for business intelligence tools like Tableau, I created a Python script that automatically exports fresh snapshots of contractor data into CSV format.

# pipeline/insert_data.py

with open("pipeline/mock_data.csv", "r") as file:
    reader = csv.reader(file)
    next(reader)  # Skip header
    for row in reader:
        cur.execute(
            "INSERT INTO contractors (name, avg_cost, overdue_orders, response_time) VALUES (%s, %s, %s, %s)",
            row
        )

Step 2: PostgreSQL Database Storage

The data captured through the API is stored in a PostgreSQL database using three manually created tables:

contractors
work_orders
locations

Each table is designed with a realistic schema to reflect how a facilities management system like CVS might organize contractor and service data. While this project focuses on reading data through the API, the schema is fully ready for insert and update operations as well.

This database layer ensures that data is stored in a structured, queryable format — forming the backbone for downstream analysis and automation.

Step 1: API Simulation with Flask

The pipeline begins with a Flask application that simulates real-world contractor activity by accepting and serving data through REST API endpoints:

/contractors – retrieves contractor records
/work_orders – (planned) handles work order submissions
/locations – (planned) manages service location details

These endpoints mimic how external systems (like facilities tools or contractor portals) would push or pull data from a centralized backend. This makes it ideal for simulating production-like behavior without relying on external vendors.

@app.route("/contractors", methods=["GET"])
def get_contractors():
    conn = get_db_connection()
    cur = conn.cursor()
    cur.execute("SELECT * FROM contractors LIMIT 10;")
    rows = cur.fetchall()
    cur.close()
    conn.close()
    return jsonify(rows)

Step-by-Step Pipeline Breakdown

Step 1: API Simulation with Flask

The pipeline begins with a Flask application that simulates real-world contractor activity by accepting and serving data through REST API endpoints:

/contractors – retrieves contractor records
/work_orders – (planned) handles work order submissions
/locations – (planned) manages service location details

Step 1: API Simulation with Flask

The pipeline begins with a Flask application that simulates real-world contractor activity by accepting and serving data through REST API endpoints:

/contractors – retrieves contractor records
/work_orders – (planned) handles work order submissions
/locations – (planned) manages service location details

@app.route("/contractors", methods=["GET"])
def get_contractors():
    conn = get_db_connection()
    cur = conn.cursor()
    cur.execute("SELECT * FROM contractors LIMIT 10;")
    rows = cur.fetchall()
    cur.close()
    conn.close()
    return jsonify(rows)

Step 2: PostgreSQL Database Storage

The data captured through the API is stored in a PostgreSQL database using three manually created tables:

contractors
work_orders
locations

This database layer ensures that data is stored in a structured, queryable format — forming the backbone for downstream analysis and automation.

# pipeline/insert_data.py

with open("pipeline/mock_data.csv", "r") as file:
    reader = csv.reader(file)
    next(reader)  # Skip header
    for row in reader:
        cur.execute(
            "INSERT INTO contractors (name, avg_cost, overdue_orders, response_time) VALUES (%s, %s, %s, %s)",
            row
        )

What This Project Demonstrates

This project simulates a realistic contractor management system — built entirely from scratch — to demonstrate backend automation and data engineering fundamentals. It shows how clean, structured data can be generated, stored, and delivered automatically for analysis or dashboarding.

Here’s what it demonstrates:

Real-world simulation of a backend system
Simulates incoming contractor and work order data through an API, just like real vendor systems at companies like CVS.
Data flow automation
Automates the full pipeline — from ingestion to storage to daily export — using Python and PostgreSQL.
Integration-ready data delivery for BI tools
Outputs clean CSVs ready for use in Tableau, Power BI, or any other analytics platform.

What This Project Demonstrates

Here’s what it demonstrates:

Real-world simulation of a backend system
Simulates incoming contractor and work order data through an API, just like real vendor systems at companies like CVS.
Data flow automation
Automates the full pipeline — from ingestion to storage to daily export — using Python and PostgreSQL.
Integration-ready data delivery for BI tools
Outputs clean CSVs ready for use in Tableau, Power BI, or any other analytics platform.

Step-by-Step Pipeline Breakdown