
Broadening Participation in Data Mining
BPDM 2017 August 12 – 13, Halifax, Nova Scotia – Canada
The Broadening Participation in Data Mining Workshop (BPDM 2017) will be on August 12th to 13th, 2017. It will be hosted with the ACM SIGKDD 2017 Conference on Knowledge Discovery and Data Mining at the World Trade and Convention Centre, in Halifax, Nova Scotia – Canada. This year’s program will span two days and include keynotes, panels, technical tutorials, and mentoring activities.
The Broadening Participation in Data Mining Workshop (BPDM at KDD 2017) will be hosted with the ACM SIGKDD 2017 Conference on Knowledge Discovery and Data Mining. This year’s program will span two days and include keynotes, panels, technical tutorials, and mentoring activities.
The event will be hosted in World Trade and Convention Centre at Halifax, Nova Scotia – Canada from August 12th, 2017 to August 13th, 2017. General attendance is open and encouraged. Registration for non scholarship recipients will be announced in August.
Important Dates
- Scholarship Applications: April 28, 2017 to August 13, 2017
- Scholarship recipients notification: To be announced.
Day 1 – Professional Development (August 12, 2017)
World Trade and Convention Centre1800 Argyle Street
Halifax, Nova Scotia NS B3J 2V9
08:00 – 08:30 AM (PDT) (15:00 GMT) |
Welcome (Chairs)Sarah M Brown and Christan Grant (BPDM General Chairs) |
8:30 – 10:00 AM (PDT) (15:30 GMT) |
Keynote: The Human Components of Machine LearningJenn Wortman Vaughan – Microsoft Research, New York City ![]() Abstract: Machine learning is usually viewed as an automated process. Data is fed to a learning algorithm that outputs a trained model which then makes predictions. In practice, however, it is common for every step of this process to rely on humans in the loop. Read More… » Bio: Jenn Wortman Vaughan is a Senior Researcher at Microsoft Research, New York City. She studies algorithmic economics, machine learning, and social computing, often in the context of prediction markets, crowdsourcing, and other human-in-the-loop systems. Jenn came to MSR in 2012 from UCLA, where she was an assistant professor in the computer science department. Read More… » |
10:00 – 10:30 AM (PDT) (17:00 GMT) |
Break |
10:30 – 12:30 PM (PDT) (17:30 GMT) |
Mentoring SessionPanelist and Group mentoring where students are broken down by seniority. The mentor will introduce themselves and prompted with questions to facilitate. There are general research-focused and career-focused mentoring. |
12:30 – 1:30 PM (PDT) (19:30 GMT) |
Lunch |
1:30 – 3:00 PM (PDT) (20:30 GMT) |
Elevator PitchesTo be announced |
3:00 – 3:30 PM (PDT) (22:00 GMT) |
Break |
3:30 – 5:00 PM (PDT) (22:30 GMT) |
Mentoring SessionSmall group matched by topic. The participants are instructed to have a research statement to introduce themselves to the group (including mentor). Mentors are instructed to focus on research feedback |
5:00 – 7:30 PM (PDT) (00:00 GMT -next day-) |
BPDM Meet & Greet |
Day 2 – Technical Learning and Application (August 13, 2017)
World Trade and Convention Centre1800 Argyle Street
Halifax, Nova Scotia NS B3J 2V9
08:00 – 09:30 AM (PDT) (15:00 GMT) |
Keynote: Engineering Mindset for Research and Career DevelopmentTao Xie – University of Illinois Urbana Champaign ![]() Abstract: Developing a successful career, such as producing a high-impact research portfolio or agenda, is a common goal for researchers. Gaining the skill set to accomplish such goal is also very important for personal development. Read More… » Bio: Tao Xie is an Associate Professor and Willett Faculty Scholar in the Department of Computer Science at the University of Illinois at Urbana-Champaign, USA. He worked as a visiting researcher at Microsoft Research. His research interests are in software engineering, focusing on software testing, program analysis, Read More… » |
09:30 – 10:00 AM (PDT) (16:30 GMT) |
Poster Set-up |
10:00 – 12:00 PM (PDT) (17:00 GMT) |
“Ethics and Fairness” speakers from FATMLThe past few years have seen growing recognition that machine learning raises novel challenges for ensuring non-discrimination, due process, and understandability in decision-making. In particular, policymakers, regulators, and advocates have expressed fears about the potentially discriminatory impact of machine learning, with many calling for further technical research into the dangers of inadvertently encoding bias into automated decisions. At the same time, there is increasing alarm that the complexity of machine learning may reduce the justification for consequential decisions to “the algorithm made me do it.” Researchers explore how to characterize and address these issues with computationally rigorous methods. |
12:00 – 1:30 PM (PDT) (19:00 GMT) |
Lunch |
1:30 – 3:00 PM (PDT) (20:30 GMT) |
In the Lab – Part 1 |
3:00 – 3:30 PM (PDT) (22:00 GMT) |
Break(optional) |
3:30 – 5:00 PM (PDT) (22:30 GMT) |
In the Lab – Part 2 |
5:00 – 6:00 PM (PDT) (00:00 GMT -next day-) |
Wrap-Up/Close Out |
Dr. Brandeis Marshall – General Co-Chair Emeritus

Dr. Brandeis Marshall is an Associate Professor of Computer Science at Spelman College. She has earned M.S. and Ph.D. degrees in Computer Science from Rensselaer Polytechnic Institute. Her research lies in the areas of information retrieval, knowledge management, data mining and social media. Her research mission is aimed at effective assessment and summarization of data in order to create valuable knowledge. In particular, her work can be applied to business intelligence in the areas of labelled data analysis, unlabeled data analysis and fixed-length data analysis. Dr Marshall has served as coPI of the Information Security Research and Education Collaborative (NSF DUE #1344369). She has also served as PI of the Broadening Participation in Data Mining Program (NSF IIS #1232397 with Caio Soares) and has helped support and organize similar broadening participation ventures in other areas such as High-Performance Computing.
Dr. Caio Soares – General Co-Chair Emeritus

Dr. Caio Soares is Founder and General Co-Chair of the Broadening Participation in Data Mining Program. He is also a Senior Data Scientist at Zynga in San Francisco, CA and previously Senior Research Engineer at Robert Bosch Research and Technology Center. He holds Ph.D. and M.S. degrees in Computer Science and Software Engineering from Auburn University, Auburn, AL. He has also earned B.S. degrees in Computer Science and Mathematics from Berry College, Mt. Berry, GA. Dr. Soares is a SREB Fellow and a recipient of the Google Hispanic Scholarship. His research interests are in Data Mining, Machine Learning, Big Data, and Distributed & Parallel computing, focusing on real-world and human centered domains.
Sarah M Brown – General Chair

Dr. Sarah M Brown is a Chancellor’s Postdoctoral Fellow in the Department of Electrical Engineering and Computer Science. Dr. Brown received her BS in Electrical Engineering with a minor in Biomedical Engineering in May 2011 magna cum laude, MS in Electrical and Computer Engineering in January 2014, and PhD in Electrical Engineering in December 2016 all from Northeastern University. Her graduate studies were supported by a Draper Laboratory Fellowship and a National Science Foundation Graduate Research Fellowship. Dr. Brown’s research interests are in the design and analysis of machine learning methods for scientific research, to date focusing on psychology neuroscience applications. Outside of the lab, Sarah is a passionate advocate for underrepresented STEM engagement at all levels. Currently she serves as treasurer for Women In Machine Learning and previously as finance and sponsorship chair as a co-organizer for the WiML Workshop. As a student, Sarah served in a various roles in the National Society of Black Engineers at both the local and national levels.
Christan Grant – General Chair

Dr. Christan Grant is an Assistant Professor at the University of Oklahoma. He completed his degrees in computer science and engineering from the University of Florida. His research interests involve the union of databases and text analytics. This includes the natural language processing, relational databases, data mining, and probabilistic knowledge base assisted question answering systems. He is also building system to allow humans to work more fluently with big data algorithms. As a professor at the University of Oklahoma, in the Data Science and Analytics program, he is involved in several funded projects with the USDA, FAA, NSF, Mellon Foundation and Robert Wood Johnson Foundation.
Dr. Omar U. Florez – Adviser Mentoring

Dr. Omar U. Florez is a Research Scientist at the Personalized Computing group at Intel Labs (Santa Clara, CA). He received his PhD in Computer Science at Utah State University and he is also a recipient of an Innovation Award on Large-Scale Analytics by IBM Research. His research interests cover statistical machine learning, recommender systems, and deep learning. He has published 20+ research publications and his prior work experience include IBM Research (2010, 2011). Dr. Florez is also co-founder of Southamericans in Computing and is board member of the Intel Latino Network (ILN).
Heriberto Acosta – Social Media Coordinator

I’ve been volunteering with the BPDM since 2014 managing the social media accounts and photography duties during the workshop. I am currently a PhD Student at Nova Southeastern University. My research focuses in the area of information system security, specifically the intersection of privacy, security, and usability beliefs. I also have a Master Degree in Data Mining from the Polytechnic University of Puerto Rico. I’ve had experience in the use of data science for health analytics. During Masters Degree, I created a data mining program to find correlations in risk factors of patients with Alzheimer’s Disease. In addition to my PhD research, I am also working with Dr Patricia Ordoñez from the University of Puerto Rico in the creation of a data streaming engine for a health bioinformatics visualization program. I currently work as a civilian contractor for the Army National Guard (Puerto Rico) as an IT Site Administrator for their Distributed Learning Program.
Orlando Ferrer – Social Media Coordinator

Orlando Ferrer is a Software Engineer with a passion for projects in data mining and machine learning. He has a B.S. in Computer Engineering from the University of Puerto Rico – Mayaguez, and a M.E. in Software Engineering from the Polytechnic University of Puerto Rico. For the past several years he has been working in the industry, consulting for aerospace companies such as Pratt & Whitney, Lockheed Martin, Cessna, Bell Helicopter, KLM, and others. In addition to data science, he has an interest in programming languages, web technologies, backend design, and automation.
Rudy Godoy – Web Committee Chair

Rudy began his career in technology at the early age of 18. He worked for GMD, a Peruvian consulting firm, where he operated mainframe hardware and technologies such as DEC’s VAX and Alpha, OpenVMS and DEC Unix. He then joined TIM Peru’s Marketing Department as a Wireless Value Added Technical Consultant. He was in charge of product management and successfully launched the first SMS-based services in Peru. He later founded a Software consulting firm, acting as CEO and Product Manager. He managed to deliver software products for top financial and consulting companies in Peru. Previous to his current endeavours he worked for TIBCO Jaspersoft’s Business Development unit leading technical efforts to enable Jaspersoft integrations with the latest Big Data technologies. He also successfully certified the product line with third-party vendors such as Cloudera, Databricks, Hortonworks and MongoDB. He attended to the Computer Science program at the Universidad Católica San Pablo in Peru. His research interests includes Machine Learning, Data Mining, Big Data, Cryptography and Programing.
Ivan Brugere – Mentoring Coordinator

Ivan is a final-year Ph.D. student and ESP-IGERT Fellow in Security and Privacy at the University of Illinois at Chicago. He earned his M.S. at the University of Minnesota focusing on large-scale spatiotemporal data mining with applications in ecology and climate science. His current research focuses on measuring and evaluating graph models inferred from the data of individual entities for a particular task or data science question. His work is in applications to data-driven science, including ecology, bioinformatics, as well as web applications such as relational user modeling and recommendation. Ivan is a 2014 Google Lime Scholar for students with disabilities, and helped organize the Broadening Participation in Data Mining workshop in its 2014 and 2016 iterations.
Jacqueline Fairley – Fundraising Chair

Jacqueline Fairley is a research engineer in the Sensors and Electromagnetic Applications Laboratory at GTRI. Her research interests include development, testing, evaluation, and real-time implementation of radar signal processing algorithms and architectures. Dr. Fairley holds a B.S. in electrical engineering from the University of Missouri-Columbia, as well as an M.S. and Ph.D. in electrical and computer engineering from the Georgia Institute of Technology.
Caitlin Kuhlman – Chair Tutorials Committee

Ms. Kuhlman is a Ph.D. student in the Department of Computer Science at Worcester Polytechnic Institute working in the Database Systems Research Group. Her research is in the area of scalable machine learning and data mining using distributed systems. She is also focused on using public data to solve social problems. She is the lead researcher and developer on the Massachusetts Technology, Talent, and Economic Reporting System, an online tool to measure the economic competitiveness of US states, and in 2016 she was a member of the inaugural class of Data Science for Social Good Fellows at IBM Research.
Annie En-Shiun Lee – Co-Mentoring Chair

En-Shiun Annie Lee holds a PhD from the Centre of Pattern Analysis and Machine Intelligence from the department of System Design Engineering from University of Waterloo. Dr. Lee has experience in techniques from pattern recognition, data mining, and machine learning to solve relevant problems, such as health data analysis, computational advertising, and sentiment analysis. She is interested in learning how patterns appear in text and in nature, especially what those patterns mean with respect to big data. Dr. Lee has over 8 years of experience as a researcher, and over 4 years of experience in various sectors of the industry, including software, energy, services, as well as marketing and media. Dr. Lee has published in bioinformatics, knowledge discovery, and artificial intelligence, and has filed a patent in biosequence analysis. She is a recipient of the Natural Sciences and Engineering Research Council Post Graduate Scholarship, the Ontario Graduate Scholarship, and Mitacs Accelerate Internship. Dr. Lee has served as a co-organizer for the WiML Workshop co-hosted with NIPS in Vancouver, BC, Canada, 2010.
Reihaneh Rabbany – Mentoring Committee

Reihaneh Rabbany is a Postdoctoral Fellow at the Auton Lab, Carnegie Mellon University. She researches data mining and machine learning techniques for analyzing real-world attributed graphs. Previously, she has been a member of Alberta Innovates Center for Machine Learning. She has a Ph.D. and M.Sc. in Computing Science form University of Alberta, Canada.
Alexander Rodriguez – Fundraising Committee

Alexander is a Master’s student in Data Science and Analytics and a graduate research assistant at the interdisciplinary Community Resilience CORE Research Lab at University of Oklahoma (OU) which is a part of the broader NIST-funded Center of Excellence on Community Resilience. His current research involves the application of machine learning techniques to assist decision making in natural disasters mitigation. He completed his B.Sc. in Mechatronics Engineering at Universidad Nacional de Ingenieria, Peru. During his undergrad, he worked on several predictive analytics projects. In his senior year, he was awarded with a scholarship to spend one year as an exchange student at OU. After graduating, he spent one year in the Maintenance Department of a LNG plant, where he experimented on the application of prognostics for some automated devices. Then he got a position on Business Intelligence for a short period before his Master’s studies.
Mariya Vasileva – Social Media Committee

Mariya Vasileva is currently a Ph.D. student in the Artificial Intelligence Group at the Department of Computer Science, University of Illinois at Urbana-Champaign. Her current research focuses on machine learning and artificial intelligence, deep learning and optimization, and statistical learning theory, primarily in application to computer vision problems. Her most recent projects have involved employing novel developments in deep learning methods for scene understanding and generative image modelling. Mariya is concurrently completing her M.S. degree in Applied Mathematics with a focus on optimization and algorithms at the University of Illinois at Urbana-Champaign. She earned her B.S. degrees in Mechanical Engineering and Business Economics from the California Institute of Technology in Pasadena, California, after which she spent a year working as an engineer at Schlumberger Technology Corporation in Houston, Texas, before proceeding to pursue her graduate studies.
Pablo Fonseca – Social media Committee

Pablo Fonseca is a PhD Student in Machine Learning at The University of Campinas in Brazil. He earned his M.S. degree in Computer Science from University of Campinas in 2015 and his B.S. in Informatics Engineering at the Pontifical Catholic University of Peru in 2010. His research interests lie in Machine Learning and Image Processing.
Jose Lugo-Martinez – Mentoring Committee

Dr. Jose Lugo-Martinez is a Post-doctoral Fellow in the Precision Health Initiative at Indiana University. His research interests include machine learning, data and text mining, computational biology and structural bioinformatics. He received his Ph.D. in Computer Science with a minor in Bioinformatics from Indiana University under the supervision of Predrag Radivojac. Prior to that, Dr. Lugo-Martinez received dual B.S. degrees in Computer Science and Mathematics at the University of Puerto Rico-Rio Piedras and M.S. degree in Computer Science at the University of California-San Diego. His research is focused on the development of robust kernel methods for learning and mining on noisy and complex graph and hypergraph data. In particular, he develops computational approaches towards understanding protein function and how disruption of protein function leads to disease. He is a member of the International Society for Computational Biology and a reviewer for several scientific journals and conferences.
TBA