Ben Summers
Verified Expert in Engineering
Data Engineer and Machine Learning Developer
With a PhD in pure maths, Ben would describe himself as an academic at heart, which means he is deeply passionate about his work. Since finishing his PhD in 2012, he has worked professionally as a back-end and data engineer for a large global company and a small startup. Since 2015, he has been obsessed with machine learning, especially neural networks, and enjoys applying these techniques to solve real-world problems. Ben has been freelancing via Toptal since 2019.
Portfolio
Experience
Availability
Preferred Environment
Linux, Git, PyCharm, Jupyter, Python, Python 3, Generative Pre-trained Transformers (GPT), Docker, GitHub
The most amazing...
...project I've done is my Ph.D. thesis—writing didn't come naturally and it posed a real challenge.
Work Experience
AI and Data Consultant
Sonbol Consulting AB
- Developed a proof of concept in 3D machine learning using PyTorch and PyTorch3D.
- Developed a Shopify app for generating product descriptions using GPT.
- Created a sourdough monitor using modern computer vision techniques.
- Built out a data warehouse for a client using Fivetran and data build tool (dbt).
Senior/Mid-senior Data Engineer
Birchbox
- Created pipelines to populate the data warehouse (Redshift) from various sources using Fivetran, including custom connectors in AWS Lambda with Terraform.
- Built out data warehouse (Redshift) with dbt for defining transformations.
- Created reverse ETL pipelines from Redshift into Braze using dbt and Airflow.
- Migrated data from a legacy Magento store into a new Shopify store using Python scripts.
Airflow Engineer for a Data Management Platform
Idelic (via Toptal)
- Ported existing ETL jobs from a legacy Celery-based system to run on Airflow (Astronomer-hosted). Sources included S3, REST APIs, and SOAP APIs.
- Guided the team to employ Apache Airflow best practices/conventions.
- Strengthened already strong experience with PyCharm, Python, Apache Airflow, and Git.
3D Graphics Machine Learning Engineer
Toptal Client
- Designed and implemented a 3D reconstruction pipeline.
- Constructed a dataset for a high-quality 3D reconstruction.
- Reviewed literature to select the best approach for the client's requirements.
- Used Azure virtual machines to train machine learning models with Weights and Biases for experiment tracking.
Research Programmer
USC ISI (via Toptal)
- Improved cross-lingual query summarization system, resulting in the team winning during the evaluation period despite being in second place before the summarization stage.
- Increased the speed of experiment runs by using an approximate k-nearest neighbors algorithm for embedding lookups using the Annoy library after identifying the bottleneck using py-spy.
- Increased iteration speed and reliability by enforcing design decisions with tests and structuring code.
Data Scientist
Instabridge
- Migrated a data system from AWS to Google Cloud.
- Developed models to identify moving WiFi hotspots, e.g., those hotspots on trains or mobile devices.
- Built models to estimate locations of WiFi hotspots from scans and connections by Android devices.
- Wrote and deployed data models in/with dbt (data build tools).
- Produced various ad-hoc analyses for stakeholders.
- Deployed Snowplow event pipelines on the Google Cloud Platform (GCP) with Cloud Pub/Sub, Dataflow, BigQuery, and Google Compute Engine.
Back-end Developer
Instabridge
- Designed and implemented the back-end architecture utilizing Heroku, AWS, and GCP.
- Implemented data pipelines in Spark running on EMR scheduled with Airflow.
- Applied machine learning to solve core data problems such as estimating locations of WiFi hotspots, quality of hotspots, classifying hotspots as moving or stationary, public or private, and matching hotspots and venues.
- Implemented near real-time data pipelines using AWS Kinesis, lambda functions, and DynamoDB.
Solutions Engineer
Cadence Design Systems
- Developed internal productivity/process web applications for one of the two leading electronic design automation companies.
- Improved my ability to work effectively in teams.
- Developed communication skills.
- Evaluated and continuously ranked priorities based on the business value.
Associate Tutor
University of East Anglia
- Successfully communicated difficult concepts to a range of students.
- Marked coursework of undergraduate mathematics students.
- Helped undergraduate mathematics students with coursework problems.
Experience
Web-based Server Monitor and Admin Tool for Medal of Honor
Fivetran Custom Connectors for a Subscription Box Service
Shopify App for AI-generated Product Descriptions
http://www.sonbol.seSkills
Languages
Python, SQL, Python 3, JavaScript, HTML, PHP, Haskell, Scala, Java, Stored Procedure, R
Libraries/APIs
LSTM, PyTorch, TensorFlow, Fast.ai, Spark ML, FFmpeg, Keras, PySpark, Scikit-learn, Natural Language Toolkit (NLTK), ZeroMQ, Pandas, NumPy, OpenCV, Requests, Beautiful Soup, Node.js
Tools
BigQuery, Amazon Elastic MapReduce (EMR), Spark SQL, Apache Airflow, Cron, Microsoft Excel, Jupyter, PyCharm, Git, Perforce, Gensim, Doccano, RabbitMQ, Google Compute Engine (GCE), Terraform, Cloud Dataflow, Google Cloud Composer, GitHub, OpenAI Gym, Amazon Athena, Looker, AWS Glue
Paradigms
ETL, Data Science, Database Design, Functional Programming, Object-oriented Programming (OOP), Business Intelligence (BI), Serverless Architecture, Agile, Search Engine Optimization (SEO), DevOps
Platforms
Linux, Google Cloud Platform (GCP), Amazon Web Services (AWS), Heroku, AWS Lambda, Oracle, Blackboard, Arduino, Anaconda, Azure, Docker, NVIDIA CUDA, Databricks, AWS IoT, Shopify
Storage
Amazon S3 (AWS S3), JSON, Databases, Database Structure, Database Transactions, PostgreSQL, NoSQL, Data Pipelines, Data Integration, Redis, Data Lakes, Redshift, MySQL, MongoDB, PL/SQL
Other
EMR, Convolutional Neural Networks (CNN), Linear Algebra, Google BigQuery, Neural Networks, Deep Learning, Artificial Intelligence (AI), Machine Learning, Data Engineering, Deep Neural Networks, CSV, Cloud Storage, Data Analysis, APIs, Data Aggregation, Pipelines, Back-end, Data Analytics, AI Programming, Transactions, Data Architecture, Data, EDA, Exploratory Data Analysis, Technical Architecture, ETL Tools, Architecture, Research, API Design, Machine Learning Automation, Supervised Machine Learning, Natural Language Processing (NLP), Probability Theory, Stream Processing, IP Networks, Image Recognition, Statistics, Deep Reinforcement Learning, Computer Vision, Audio, Audio Processing, Digital Signal Processing, Data Modeling, Data Warehousing, Data Warehouse Design, Data Visualization, Data Build Tool (dbt), Infrastructure, Data Reporting, ELT, GPT, Generative Pre-trained Transformers (GPT), Web Development, Language Models, Software Architecture, CI/CD Pipelines, Security, Modeling, Dashboards, Computer Vision Algorithms, Generative Artificial Intelligence (GenAI), Large Language Models (LLMs), Serverless, Big Data, Amazon API Gateway, Reinforcement Learning, Amazon Kinesis, Microsoft 365, Pen & Paper, Generative Adversarial Networks (GANs), PyTorch3D, Google Data Studio, Lambda Functions, Fivetran, Lean, ChatGPT, OpenAI GPT-3 API, OpenAI GPT-4 API, Machine Learning Operations (MLOps), LangChain, Monitoring, FastAPI, Web Scraping, Generative Pre-trained Transformer 3 (GPT-3), Open Neural Network Exchange (ONNX), Benchmarking
Frameworks
Apache Spark, Spark, Flask, Django, Ruby on Rails (RoR), Selenium, Hadoop, Next.js
Education
B2 CEFR in Greek Language and Culture
University of Ioannina - Ioannina, Greece
Ph.D in Mathematics
University of East Anglia - Norwich, UK
Master's Degree in Mathematics
University of East Anglia - Norwich, UK
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring