Lubo Bali - Data Engineer | AI Systems Builder
Chicago, IL • (312)-358-0008 • data@lubobali.com •
Lubo Bali – Data Engineer | AI Systems Builder
Data Engineer building production AI systems that learn from data. Created LuBot.ai – an AI analytics agent serving real users with 70,000+ lines of Python, 30+ table PostgreSQL schema, and 17 nightly batch workers. Built 9 module Intelligence Engine for anomaly detection, driver analysis, concentration risk, correlation, forecasting, and domain-aware insights. Currently building 50 CEO-grade insight templates (HR, Sales, Finance, Marketing, Operations). Hands-on with ETL pipelines (Airflow, Spark, Flink, Kafka), zero-hallucination systems, and multi-LLM architectures (Groq, Ollama). Planning data quality framework with automated validation checks.
CERTIFICATIONS / TECHNICAL PROFICIENCIES
Data & Pipelines: PostgreSQL, Snowflake, SQL, Airflow, Python, FastAPI, REST APIs
Streaming: Apache Spark, Flink, Kafka
AI/ML: AdalFlow, FAISS, Groq, Ollama, LangChain
Infrastructure: AWS, Docker, Git, Railway
Frontend: Next.js, React
WORK EXPERIENCE
LUBOT.AI Chicago IL (Remote)
AI/Data Engineer (Founder) Jun 2025 – Present
· Built production AI analytics agent serving real users at lubot.ai (70,000+ lines Python)
· Designed 30+ table PostgreSQL schema tracking thousands of user behavior signals
· Implemented 17 nightly batch workers for self-learning (route optimization, user profiling, data baselines)
· Built 9-module Intelligence Engine: anomaly detection, driver analysis, correlation, forecasting
· Engineered zero-hallucination pipeline - every insight includes source citations
· Multi-LLM architecture (Groq, Ollama) with FAISS vector similarity for few-shot learning
· Tech: Python, FastAPI, PostgreSQL, Redis, Docker, Next.js, AdalFlow, FAISS
. Currently building McKinsey-grade Intelligence Layer with 50 domain-specific templates (HR, Sales, Finance, Marketing,
Operations) for context-aware insights
REMAX Chicago IL
Data Engineer Jan 2024 – Jan 2025
· Built Python ETL pipeline to ingest residential property data from MLS APIs
into MySQL database
· Automated daily data refreshes and validation checks, reducing manual work
· Created Tableau dashboards analyzing property inventory and pricing trends
EDUCATION
Data/AI Analytics Engineering Bootcamp Online
DataExpert.io Jan – Apr 2026
• Advanced Snowflake optimization, data modeling, and cost management
• DBT for analytics engineering and pipeline orchestration
• Airflow DAGs and data quality frameworks
• Apache Iceberg and modern data Lakehouse architecture
Data Analytics Bootcamp Certificate Online
Data Career Jumpstart Feb – June 2025
Data Engineering Bootcamp Online
DataExpert.io June 2025 - August 2025
· Dimensional and fact data modeling
· Apache Spark, Flink, and Kafka for large-scale and streaming data
· Data contracts, data quality, and documentation
· Communication, visualization, and stakeholder impact using Tableau
Associate of Applied Science in Computer Systems Springfield, – 3.5 GPA Online
Lincoln Land College, Springfield, IL Expected July 2026
PROJECTS
AIRFLOW PRODUCTION DATA QUALITY PIPELINE
Airflow, soda, PostgreSQL, python
https://github.com/lubobali/airflow-dq-pipeline
- Built production-grade airflow Dag implementing write-audit-publish pattern with soda data quality checks
- automated data validation across 5+ tables with custom quality rules and alerting
- Integrated PostgreSQL with continuous monitoring for schema drift and data anomalies
STREAMING PIPELINE WITH FLINK & KAFKA
Apache Flink, Kafka, docker, python
https://github.com/lubobali/flink-sessionization-project
- Designed real-time event processing pipeline using Apache Flink for session analytics
- Implemented Kafka producers/consumers with Flink transformations for stream processing
- Containerized entire stack with docker for reproducible development environment
AIRFLOW + SODA INTEGRATION
Airflow, soda, PostgreSQL, python
https://github.com/lubobali/airflow-soda-integration
- integrated soda core with airflow for automated data quality monitoring
- created reusable quality check framework with email notifications on failures
- reduced manual validation time by automating 20+ data quality rules
CVS CONTRACTOR API PIPELINE & DASHBOARD
Python, MySQL, API integration, tableau
https://github.com/lubobali/cvs_contractor-api-pipeline-and-dashboard
- built end-to-end ETL pipeline extracting contractor data from apis into mysql database
- automated daily data refreshes with error handling and data validation
- created interactive tableau dashboard for contractor performance analytics
SNOWFLAKE JOB MARKET DASHBOARD
Snowflake, dbt, python, sql
https://lubobali.com/arizona-job-market
- analyzed 750k+ job postings using snowflake data warehouse and dbt transformations
- optimized query performance by 60% using clustering keys and materialized views
- generated insights on top-paying data engineering skills and market trends
EMPLOYEE SENTIMENT ANALYSIS
Python, nlp, scikit-learn, PostgreSQL
https://github.com/lubobali/employee-sentiment-analysis
- built nlp pipeline processing employee feedback with sentiment classification
- achieved 85% accuracy using tf-idf vectorization and machine learning models
- automated monthly sentiment reports with trend analysis and visualizations
SQL HEALTHCARE READMISSION
PostgreSQL, SQL, data analysis
- analyzed 130k+ patient records to identify readmission patterns using advanced SQL
- created complex queries with ctes, window functions, and subqueries
- reduced readmission rate by identifying high-risk patient cohorts
Lubo Bali
Chicago, IL data@lubobali.com See My Portfolio
Data Engineer building production AI systems that learn from data. Created LuBot.ai – an AI analytics agent serving real users with 70,000+ lines of Python, 30+ table PostgreSQL schema, and 17 nightly batch workers. Built 9 module Intelligence Engine for anomaly detection, driver analysis, concentration risk, correlation, forecasting, and domain-aware insights. Currently building 50 CEO-grade insight templates (HR, Sales, Finance, Marketing, Operations). Hands-on with ETL pipelines (Airflow, Spark, Flink, Kafka), zero-hallucination systems, and multi-LLM architectures (Groq, Ollama). Planning data quality framework with automated validation checks.
CERTIFICATIONS/TECHNICAL PROFICIENCIES
Data & Pipelines: PostgreSQL, Snowflake, SQL, Airflow, Python, FastAPI, REST APIs
Streaming: Apache Spark, Flink, Kafka
AI/ML: AdalFlow, FAISS, Groq, Ollama, LangChain
Infrastructure: AWS, Docker, Git, Railway
Frontend: Next.js, React
WORK EXPERIENCE
LUBOT.AI Chicago IL
AI/Data Engineer Jun 2025 – Present
Built production AI analytics agent serving real users at lubot.ai (70,000+ lines Python)
· Designed 30+ table PostgreSQL schema tracking thousands of user behavior signals
· Implemented 17 nightly batch workers for self-learning (route optimization, user profiling, data baselines)
· Built 9-module Intelligence Engine: anomaly detection, driver analysis, correlation, forecasting
· Engineered zero-hallucination pipeline - every insight includes source citations
· Multi-LLM architecture (Groq, Ollama) with FAISS vector similarity for few-shot learning
· Tech: Python, FastAPI, PostgreSQL, Redis, Docker, Next.js, AdalFlow, FAISS
. Currently building McKinsey-grade Intelligence Layer with 50 domain-specific templates (HR, Sales, Finance, Marketing, Operations) for context-aware insights.
REMAX Chicago IL
Data Engineer Jan 2024 – Jan 2025
· Built Python ETL pipeline to ingest residential property data from MLS APIs
into MySQL database
· Automated daily data refreshes and validation checks, reducing manual work
· Created Tableau dashboards analyzing property inventory and pricing trends
EDUCATION
Data/AI Analytics Engineering Bootcamp Online
DataExpert.io Jan – Apr 2026
• Advanced Snowflake optimization, data modeling, and cost management
• DBT for analytics engineering and pipeline orchestration
• Airflow DAGs and data quality frameworks
• Apache Iceberg and modern data Lakehouse architecture
Data Analytics Bootcamp Certificate Online
Data Career Jumpstart Feb - June 2025
Data Engineering Bootcamp Online
DataExpert.io June 2025 - August 2025
· Dimensional and fact data modeling
· Apache Spark, Flink, and Kafka for large-scale and streaming data
· Data contracts, data quality, and documentation
· Communication, visualization, and stakeholder impact using Tableau
Associate of Applied Science in Computer Systems Springfield, – 3.5 GPA Online
Lincoln Land College, Springfield, IL Expected July 2026
PROJECTS
AIRFLOW PRODUCTION DATA QUALITY PIPELINE
Airflow, soda, PostgreSQL, python
https://github.com/lubobali/airflow-dq-pipeline
- Built production-grade airflow Dag implementing write-audit-publish pattern with soda data quality checks
- automated data validation across 5+ tables with custom quality rules and alerting
- Integrated PostgreSQL with continuous monitoring for schema drift and data anomalies
STREAMING PIPELINE WITH FLINK & KAFKA
Apache Flink, Kafka, docker, python
https://github.com/lubobali/flink-sessionization-project
- Designed real-time event processing pipeline using Apache Flink for session analytics
- Implemented Kafka producers/consumers with Flink transformations for stream processing
- Containerized entire stack with docker for reproducible development environment
AIRFLOW + SODA INTEGRATION
Airflow, soda, PostgreSQL, python
https://github.com/lubobali/airflow-soda-integration
- integrated soda core with airflow for automated data quality monitoring
- created reusable quality check framework with email notifications on failures
- reduced manual validation time by automating 20+ data quality rules
CVS CONTRACTOR API PIPELINE & DASHBOARD
Python, MySQL, API integration, tableau
https://github.com/lubobali/cvs_contractor-api-pipeline-and-dashboard
- built end-to-end ETL pipeline extracting contractor data from apis into mysql database
- automated daily data refreshes with error handling and data validation
- created interactive tableau dashboard for contractor performance analytics
SNOWFLAKE JOB MARKET DASHBOARD
Snowflake, dbt, python, sql
https://lubobali.com/arizona-job-market
- analyzed 750k+ job postings using snowflake data warehouse and dbt transformations
- optimized query performance by 60% using clustering keys and materialized views
- generated insights on top-paying data engineering skills and market trends
EMPLOYEE SENTIMENT ANALYSIS
Python, nlp, scikit-learn, PostgreSQL
https://github.com/lubobali/employee-sentiment-analysis
- built nlp pipeline processing employee feedback with sentiment classification
- achieved 85% accuracy using tf-idf vectorization and machine learning models
- automated monthly sentiment reports with trend analysis and visualizations
SQL HEALTHCARE READMISSION
PostgreSQL, SQL, data analysis
- analyzed 130k+ patient records to identify readmission patterns using advanced SQL
- created complex queries with ctes, window functions, and subqueries
- reduced readmission rate by identifying high-risk patient cohorts

Download Resume📄(PDF)

