Monthly Archives: February 2016

Matsoukas in Data Science tea, 2:30 Tues. March 1

Who: Spyros Matsoukas
What: tea, refreshments, presentations and conversations about topics in data science
Where: Computer Science Building Rooms 150
When: 2:30 Tuesday March 1


Abstract: We will introduce Amazon Echo and Alexa, the virtual personal assistant that powers Echo, and focus on the challenges our team is facing when developing machine learning solutions for wake word detection, automatic speech recognition, natural language understanding, question answering, dialog management, and speech synthesis.

Bio: Spyros Matsoukas is a Principal Scientist at Amazon.com, developing spoken language understanding technology for voice-enabled products such as Amazon Echo. From 1998 to 2013 he worked at BBN Technologies, Cambridge MA, conducting research in acoustic modeling for ASR, speaker diarization, statistical machine translation, speaker identification, and language identification. 

Woodbury in Linguistics, 3:30 Friday Feb. 26th

Tony Woodbury of the Linguistics Department at the University of Texas,
Austin will give a talk at 3:30 PM Friday 26 February 2016 in ILC N400. The title of his talk is:

The ‘genius’ of the language: discovering pervasive plan and unique
design in linguistic description

A video of the lecture is available here: https://lsa2015.uchicago.edu/events/hale-lecture-anthony-woodbury-reception

West in Data Science 1 p.m. Thurs. Feb. 18

Thursday, February 18, 2016
1:00 – 2:00p.m.
Computer Science Building, Room 151
Faculty Host:  Brendan O’Connor
Human Behavior in Networks
Abstract:  Humans as well as information are organized in networks. Interacting with these networks is part of our daily lives: we talk to friends in our social network; we find information by navigating the Web; and we form opinions by listening to others and to the media. Thus, understanding, predicting, and enhancing human behavior in networks poses important research problems for computer and data science with practical applications of high impact.

Navigation constitutes one fundamental human behavior: in order to make use of the information and resources around us, we constantly explore, disentangle, and navigate networks such as the Web. Studying navigation patterns lets us understand better how humans reason about complex networks and lets us build more human-friendly information systems. As an example, I will present an algorithm for improving website hyperlink structure by mining raw web server logs. The resulting system is being deployed on Wikipedia’s full server logs at terabyte scale, producing links that are clicked 10 times as frequently as the average link added by Wikipedia editors.

Communication and coordination through natural language is another prominent human network behavior. Studying the interplay of network structure and language has the potential to benefit both sociolinguistics and natural language processing. Intriguing opportunities and challenges have arisen recently with the advent of online social media, which produce large amounts of both network and natural-language data. As an example, I will discuss my work on person-to-person sentiment analysis in networks, which combines the sociological theory of structural balance with techniques from natural language processing, resulting in a sentiment prediction model that clearly outperforms both text-only and network-only versions.

I will conclude the talk by sketching interesting future directions for computational approaches to studying human behavior in networks.

Bio:  Robert West is a sixth-year Ph.D. candidate in Computer Science in the Infolab at Stanford University, advised by Jure Leskovec. His research aims to understand, predict, and enhance human behavior in social and information networks by developing techniques in data science, data mining, network analysis, machine learning, and natural language processing. Previously, he obtained a Master’s degree from McGill University in 2010 and a Diplom degree from Technische Universität München in 2007.
A reception will be held at 12:40 in the atrium, outside the presentation room.

Veloso in MLFL 10:30 Friday Feb. 19

Robot Autonomy: Data Collection and Interaction with Humans

who: Manuela Veloso, CMU

when: 10:30pm Friday, 2/19

where: cs151
breakfast: Atkins Farm
generous sponsor: Yahoo!

***In general, MLFL will be Thursday 12pm this semester. However, this week it is Friday 10:30am with breakfast***

Abstract:

We research on autonomous mobile robots with a seamless integration of perception, cognition, and action. In this talk, I will briefly introduce our CoBot service robots, who consistently move in our buildings to fulfill user requests. I will then introduce the CoBot robots as novel mobile data collectors of vital information of our buildings, and present their data representation, their active data gathering algorithm, and the particular use of the gathered WiFi data by CoBot. I will further present a detailed overview of multiple contributions in terms of human-robot interaction, and detail the use and planning for language-based complex commands. I will then conclude with some philosophical and technical points on my view on the future of autonomous robots in our environments.

Bio:

Manuela Maria Veloso is the Herbert A. Simon Professor in Computer Science and Robotics at Carnegie Mellon University. She was the President of AAAI (Association for the Advancement of Artificial Intelligence) until 2014, and the co-founder and a Past President of the RoboCup Federation. She is a fellow of AAAI, IEEE, and AAAS. She founded and directs the CORAL research laboratory, for the study of autonomous agents that Collaborate, Observe, Reason, Act, and Learn, http://www.cs.cmu.edu/~coral. Professor Veloso and her students have worked with a variety of autonomous robots, including mobile service robots and soccer robots. The CoBot service robots have autonomously navigated for more than 1,000km in multi-floor office buildings. See http://www.cs.cmu.edu/~mmv for further information, including publications.

McCaffrey in Cognitive Brown Bag, noon Weds. Feb. 17

Tony McCaffrey of UMass Amherst will be presenting in the Cognitive Brown Bag Wednesday at noon in Tobin 521B. All are welcome. Combining cognitive psychology and machine intelligence, Tony’s research carefully articulates human cognitive obstacles to innovation and devises counter techniques that are implemented in software. He then articulates computer limits to innovation that humans can help counteract. The result is a human-machine synergy that has the promise of being significantly more innovative than either partner on their own. Funded by the National Science Foundation, Tony’s company is implementing the software and the human-computer interface so both partners speak the same problem-solving “language” as they collaborate. This talk will focus on the cognitive psychology results but will place them in the larger context of human-machine synergy in problem-solving.

Bowman in CS Thursday 2/11 at 4 pm

Sam Bowman of Stanford University, a candidate for a position in Computer Science, will present Thursday Feb 11 at 4pm in CS Rm 151 – the title and abstract follow.

MODELING NATURAL LANGUAGE SEMANTICS WITH LEARNED REPRESENTATIONS

Abstract:  The last few years have seen many striking successes from artificial neural network models on hard natural language processing tasks. These models replace complex hand-engineered systems for extracting and representing the meanings of sentences with learned functions that construct and use their own internal vector-based representations. Though these learned representations are effective in many domains, they aren’t interpretable in familiar terms and their ability to capture the full range of meanings expressible in natural language is not yet well understood.
In this talk, I argue that neural network models are capable of learning to represent and reason with the meanings of sentences. First, I use entailment experiments over artificial languages to show that existing models can learn to reason logically over clean language-like data. I then introduce a large new corpus of entailments in English and use experiments on that corpus to show that these abilities extend to natural language as well. Finally, I briefly present ongoing work on a new model that uses the semantic principle of compositionality to more efficiently and effectively learn to understand natural language.

Karloff in CS Data Science, Mon. Jan. 9 at 1 pm

Variable Selection is Hard

Who: Howard Karloff
When: Feb 9, 1-2pm
Where: Computer Science Building, room 150/151
 

Abstract: Consider the task of a machine-learning system faced with voluminous data on m individuals.  There may be p=10^6 features describing each individual.  How can the algorithm find a small set of features that “best” describes the individuals?  People usually seek small feature sets both because models with small feature sets are understandable and because simple models usually generalize better.

We study the simple case of linear regression, in which a user has an m x p matrix B and a vector y, and seeks a p-vector x *with as few nonzeroes as possible* such that Bx is approximately equal to y, and we call it SPARSE REGRESSION.  There are numerous algorithms in the statistical literature for SPARSE REGRESSION, such as Forward Selection, Backward Elimination, LASSO, and Ridge Regression.

We give a general hardness proof that (subject to a complexity assumption) no polynomial-time algorithm can give good performance (in the worst case) for SPARSE REGRESSION, even if it is allowed to include more variables than necessary, and even if it need only find an x such that Bx is relatively far from y.

This is joint work with Dean Foster and Justin Thaler  and was done when all coauthors were at Yahoo Labs.

Bio: After receiving a PhD from UC Berkeley, Howard Karloff taught at the University of Chicago and Georgia Tech before leaving Georgia Tech  to join AT&T Labs–Research in 1999.  He left ATT Labs in 2013 to join Yahoo Labs in New York, where he stayed till February, 2015. Now he does data science for Goldman Sachs in New York.

A fellow of the ACM, he has served on the program committees of numerous conferences and chaired the 1998 SODA program committee. He is the author of numerous journal and conference articles and  the Birkhauser book “Linear Programming.”  His interests include  data science, machine learning, algorithms, and optimization.

See this and other events at the CDS website.

Liberty in CS MLFL, Thurs. Jan. 4 at noon

Online Data Mining PCA And K-Means

who: Edo Liberty, Yahoo! Labs

when: 12:00pm Thursday, 2/4

where: cs151
pizza: Antonio’s
generous sponsor: Yahoo!

Abstract

Algorithms for data mining, unsupervised machine learning and scientific computing were traditionally designed to minimize running time in the batch setting (random access to memory). In recent years, a significant amount of research is devoted to producing scaleable algorithms for the same problems. A scaleable solution assumes some limitation on data access and/or compute model. Some well known models include map reduce, message passing, local computation, pass efficient, streaming and others. In this talk we argue for the need to consider the online model in data mining tasks. In an online setting, the algorithm receives data points one by one and must make some decision immediately (without examining the rest of the input). The quality of the algorithm’s decisions is compared to the best possible in hindsight. Note that no stochasticity assumption is made about the input. While practitioners are well aware of the need for such algorithms, this setting was mostly overlooked by the academic community. Here, we will review new results on online k-means clustering and online Principal Component Analysis (PCA).

Bio

Edo Liberty is a research director and Yahoo Labs and leads its Scalable Machine Learning group. He received his BSc in Computer Science and Physics from Tel Aviv University and his PhD in Computer Science from Yale. After his postdoctoral position at Yale in the Applied Math department he co-founded a New York based startup. Since 2009 he has been with Yahoo Labs. His research focuses on the theory and practice of large scale data mining.