Lubo Bali - Data Engineer | AI Systems Builder

Chicago, IL • (312)-358-0008 • data@lubobali.com •

See My Portfolio


Lubo Bali – Data Engineer | AI Systems Builder

 Data Engineer building production AI systems that learn from data. Created LuBot.ai – an AI analytics agent serving real users with 70,000+ lines of Python, 30+ table PostgreSQL schema, and 17 nightly batch workers. Built 9 module Intelligence Engine for anomaly detection, driver analysis, concentration risk, correlation, forecasting, and domain-aware insights. Currently building 50 CEO-grade insight templates (HR, Sales, Finance, Marketing, Operations). Hands-on with ETL pipelines (Airflow, Spark, Flink, Kafka), zero-hallucination systems, and multi-LLM architectures (Groq, Ollama). Planning data quality framework with automated validation checks.

 

CERTIFICATIONS / TECHNICAL PROFICIENCIES

Data & Pipelines:         PostgreSQL, Snowflake, SQL, Airflow, Python, FastAPI, REST APIs

Streaming:                      Apache Spark, Flink, Kafka

AI/ML:                             AdalFlow, FAISS, Groq, Ollama, LangChain

Infrastructure:           AWS, Docker, Git, Railway

Frontend:                Next.js, React


WORK EXPERIENCE


LUBOT.AI                                                                                                                                                                        Chicago IL (Remote)

AI/Data Engineer (Founder)                                                                                               Jun 2025 – Present

·       Built production AI analytics agent serving real users at lubot.ai (70,000+ lines Python)

·       Designed 30+ table PostgreSQL schema tracking thousands of user behavior signals

·       Implemented 17 nightly batch workers for self-learning (route optimization, user profiling, data baselines)

·       Built 9-module Intelligence Engine: anomaly detection, driver analysis, correlation, forecasting

·       Engineered zero-hallucination pipeline - every insight includes source citations

·       Multi-LLM architecture (Groq, Ollama) with FAISS vector similarity for few-shot learning

·       Tech: Python, FastAPI, PostgreSQL, Redis, Docker, Next.js, AdalFlow, FAISS

.    Currently building McKinsey-grade Intelligence Layer with 50 domain-specific   templates (HR, Sales, Finance, Marketing,

Operations) for context-aware insights                                                                                                                                                                                                                             

REMAX                                                                                                                                                                         Chicago IL

Data Engineer                                                                                                Jan 2024 – Jan 2025

·   Built Python ETL pipeline to ingest residential property data from MLS APIs

    into MySQL database

·   Automated daily data refreshes and validation checks, reducing manual work

·    Created Tableau dashboards analyzing property inventory and pricing trends


EDUCATION

 

Data/AI Analytics Engineering Bootcamp                                                                                                      Online

DataExpert.io                                                                                                                                                                         Jan – Apr 2026

•      Advanced Snowflake optimization, data modeling, and cost management

•      DBT for analytics engineering and pipeline orchestration

•      Airflow DAGs and data quality frameworks

•      Apache Iceberg and modern data Lakehouse architecture


Data Analytics Bootcamp Certificate                                                                                         Online

Data Career Jumpstart                                                                                                                                                   Feb – June  2025

 

Data Engineering Bootcamp                                                                                                                             Online

DataExpert.io                                                                                                                                                                               June 2025 - August 2025

·       Dimensional and fact data modeling

·       Apache Spark, Flink, and Kafka for large-scale and streaming data

·       Data contracts, data quality, and documentation

·       Communication, visualization, and stakeholder impact using Tableau


Associate of Applied Science in Computer Systems Springfield,  – 3.5 GPA                           Online    

   Lincoln Land College, Springfield, IL                                                         Expected July 2026 

 

PROJECTS

      AIRFLOW PRODUCTION DATA QUALITY PIPELINE

Airflow, soda, PostgreSQL, python

                    https://github.com/lubobali/airflow-dq-pipeline

-        Built production-grade airflow Dag implementing write-audit-publish pattern with soda data quality checks

-         automated data validation across 5+ tables with custom quality rules and alerting

-        Integrated PostgreSQL with continuous monitoring for schema drift and data anomalies

STREAMING PIPELINE WITH FLINK & KAFKA

Apache Flink, Kafka, docker, python

 https://github.com/lubobali/flink-sessionization-project

-        Designed real-time event processing pipeline using Apache Flink for session analytics

-        Implemented Kafka producers/consumers with Flink transformations for stream processing

-        Containerized entire stack with docker for reproducible development environment

AIRFLOW + SODA INTEGRATION

Airflow, soda, PostgreSQL, python

                     https://github.com/lubobali/airflow-soda-integration

  -      integrated soda core with airflow for automated data quality monitoring

  -     created reusable quality check framework with email notifications on failures

  -     reduced manual validation time by automating 20+ data quality rules

        CVS CONTRACTOR API PIPELINE & DASHBOARD

Python, MySQL, API integration, tableau

                https://github.com/lubobali/cvs_contractor-api-pipeline-and-dashboard

           -    built end-to-end ETL pipeline extracting contractor data from apis into mysql database

           -    automated daily data refreshes with error handling and data validation

              -    created interactive tableau dashboard for contractor performance analytics

SNOWFLAKE JOB MARKET DASHBOARD

Snowflake, dbt, python, sql

                     https://lubobali.com/arizona-job-market

            -    analyzed 750k+ job postings using snowflake data warehouse and dbt transformations

            -    optimized query performance by 60% using clustering keys and materialized views

            -    generated insights on top-paying data engineering skills and market trends

EMPLOYEE SENTIMENT ANALYSIS

Python, nlp, scikit-learn, PostgreSQL

                     https://github.com/lubobali/employee-sentiment-analysis

-    built nlp pipeline processing employee feedback with sentiment classification

                 -    achieved 85% accuracy using tf-idf vectorization and machine learning models

                 -    automated monthly sentiment reports with trend analysis and visualizations

SQL HEALTHCARE READMISSION

PostgreSQL, SQL, data analysis

                 https://lubobali.com/diabetes

- analyzed 130k+ patient records to identify readmission patterns using advanced SQL

- created complex queries with ctes, window functions, and subqueries

- reduced readmission rate by identifying high-risk patient cohorts

Lubo Bali

Chicago, IL data@lubobali.com See My Portfolio

Data Engineer | AI Systems Builder 

Data Engineer building production AI systems that learn from data. Created LuBot.ai – an AI analytics agent serving real users with 70,000+ lines of Python, 30+ table PostgreSQL schema, and 17 nightly batch workers. Built 9 module Intelligence Engine for anomaly detection, driver analysis, concentration risk, correlation, forecasting, and domain-aware insights. Currently building 50 CEO-grade insight templates (HR, Sales, Finance, Marketing, Operations). Hands-on with ETL pipelines (Airflow, Spark, Flink, Kafka), zero-hallucination systems, and multi-LLM architectures (Groq, Ollama). Planning data quality framework with automated validation checks.

CERTIFICATIONS/TECHNICAL PROFICIENCIES

Data & Pipelines: PostgreSQL, Snowflake, SQL, Airflow, Python, FastAPI, REST APIs

Streaming: Apache Spark, Flink, Kafka

AI/ML: AdalFlow, FAISS, Groq, Ollama, LangChain

Infrastructure:                AWS, Docker, Git, Railway

Frontend: Next.js, React

WORK EXPERIENCE

   

LUBOT.AI                                    Chicago IL

AI/Data Engineer                   Jun 2025 – Present

Built production AI analytics agent serving real users at lubot.ai (70,000+ lines Python)

·       Designed 30+ table PostgreSQL schema tracking thousands of user behavior signals

·       Implemented 17 nightly batch workers for self-learning (route optimization, user profiling, data baselines)

·       Built 9-module Intelligence Engine: anomaly detection, driver analysis, correlation, forecasting

·       Engineered zero-hallucination pipeline - every insight includes source citations

·       Multi-LLM architecture (Groq, Ollama) with FAISS vector similarity for few-shot learning

·       Tech: Python, FastAPI, PostgreSQL, Redis, Docker, Next.js, AdalFlow, FAISS

.  Currently building McKinsey-grade Intelligence Layer with 50 domain-specific   templates (HR, Sales, Finance, Marketing, Operations) for context-aware insights.                                        


REMAX                                    Chicago IL

Data Engineer                               Jan 2024 – Jan 2025

·   Built Python ETL pipeline to ingest residential property data from MLS APIs

    into MySQL database

·   Automated daily data refreshes and validation checks, reducing manual work

·    Created Tableau dashboards analyzing property inventory and pricing trends                                                                                                                                               

EDUCATION

Data/AI Analytics Engineering Bootcamp     Online

DataExpert.io     Jan – Apr 2026

•      Advanced Snowflake optimization, data modeling, and cost management

•      DBT for analytics engineering and pipeline orchestration

•      Airflow DAGs and data quality frameworks

•      Apache Iceberg and modern data Lakehouse architecture


Data Analytics Bootcamp Certificate Online 

Data Career Jumpstart Feb - June 2025


Data Engineering Bootcamp Online

DataExpert.io  June 2025 - August 2025

·  Dimensional and fact data modeling

· Apache Spark, Flink, and Kafka for large-scale and streaming data

·  Data contracts, data quality, and documentation

· Communication, visualization, and stakeholder impact using Tableau

Associate of Applied Science in Computer Systems Springfield,  – 3.5 GPA  Online  

Lincoln Land College, Springfield, IL                                       Expected July 2026 


PROJECTS


      AIRFLOW PRODUCTION DATA QUALITY PIPELINE

Airflow, soda, PostgreSQL, python

https://github.com/lubobali/airflow-dq-pipeline

-        Built production-grade airflow Dag implementing write-audit-publish pattern with soda data quality checks

-         automated data validation across 5+ tables with custom quality rules and alerting

-        Integrated PostgreSQL with continuous monitoring for schema drift and data anomalies

STREAMING PIPELINE WITH FLINK & KAFKA

Apache Flink, Kafka, docker, python

 https://github.com/lubobali/flink-sessionization-project

- Designed real-time event processing pipeline using Apache Flink for session analytics

- Implemented Kafka producers/consumers with Flink transformations for stream processing

- Containerized entire stack with docker for reproducible development environment

AIRFLOW + SODA INTEGRATION

Airflow, soda, PostgreSQL, python

 https://github.com/lubobali/airflow-soda-integration

  - integrated soda core with airflow for automated data quality monitoring

  - created reusable quality check framework with email notifications on failures

  -  reduced manual validation time by automating 20+ data quality rules

 CVS CONTRACTOR API PIPELINE & DASHBOARD

Python, MySQL, API integration, tableau

  https://github.com/lubobali/cvs_contractor-api-pipeline-and-dashboard

  -   built end-to-end ETL pipeline extracting contractor data from apis into mysql database

 -   automated daily data refreshes with error handling and data validation

  -   created interactive tableau dashboard for contractor performance analytics

SNOWFLAKE JOB MARKET DASHBOARD

Snowflake, dbt, python, sql

  https://lubobali.com/arizona-job-market

 -  analyzed 750k+ job postings using snowflake data warehouse and dbt transformations

-  optimized query performance by 60% using clustering keys and materialized views

-  generated insights on top-paying data engineering skills and market trends

EMPLOYEE SENTIMENT ANALYSIS

Python, nlp, scikit-learn, PostgreSQL

https://github.com/lubobali/employee-sentiment-analysis

-   built nlp pipeline processing employee feedback with sentiment classification

 -  achieved 85% accuracy using tf-idf vectorization and machine learning models

-   automated monthly sentiment reports with trend analysis and visualizations

SQL HEALTHCARE READMISSION

PostgreSQL, SQL, data analysis

https://lubobali.com/diabetes

- analyzed 130k+ patient records to identify readmission patterns using advanced SQL

- created complex queries with ctes, window functions, and subqueries

- reduced readmission rate by identifying high-risk patient cohorts 

Download Resume📄(PDF)



Created by Lubo Bali


© Copyright 2025. All rights reserved Privacy Policy



Created by Lubo Bali

All rights reserved

© Copyright 2025. Privacy Policy