“Data science isn’t new. What we’re doing with it is.”
On episode 160 of DisrupTV our host Vala Afshar and guest host Dion Hinchcliffe interviewed Anushka Anand, Senior Product Manager at Tableau Software; Xiao-Li Meng, Author and Data Scientist at Harvard; and Jon Reed, Co-Founder at Diginomica to discuss the growing possibilities in the data science frontier. Here are a few takeaways from the episode:
Artificial Intelligence and Machine Learning will Transform the Data Analytics Space
“Analysts spend 80% of their time preparing data and only 20% analyzing it,” according to Anushka Anand. Currently, the most challenging part of data analytics is data cleaning. Anand said machine learning has the potential to clean data to allow people to perform more meaningful analysis- a major opportunity. It won’t replace the old task but rather complement the analyst’s job interacting with data.
Machine learning and artificial intelligence can also be used to build trust with customers who use smart products. Customers are often skeptical when there are automated functions in a system, such as an information filtering system that gives recommendations, as they pertain to individual privacy. The ability to explain these functions behind the algorithm is key to building trust with customers and ensuring success of the product.
2019 Conference Focus on Data Quality in an Enterprise Context and Other Hot Topics
With Fall conference season just around the corner, there are multiple enterprise tech trends to keep an eye on. Jon Reed gave tips on how to approach these trends by turning it into a fun interactive game. Vendors are expected to hype up 5g, Cloud, and DevOps. When vendors promote 5g, you ask “What are the use cases?” When vendors promote the cloud, you ask “What about cloud security?”. Reed reminded us that with each of these trends, it is important to look at each industry specifically and remember that all trends have a dark side we must discuss. Be sure to check out his full interview for hilarious and educational takeaways.
Data Science has Innovative uses in Producing Governmental Data
Xiao-Li Meng joined the show to share his work on governmental data and the unexpected law of large populations. His primary teaching method attempts to demonstrate how one can quickly learn a large amount about a topic using statistical analysis. Meng explains the power of representative sampling. All current statistical techniques are based on the idea that data is mixed well enough. The real problem is that the natural population, especially in self reported polls, do not provide randomly reported samples.
Meng’s work focuses on how small data actually is, and he explains this in the context of the 2016 elections and how the negative effects were exacerbated because the data was not well mixed. It’s much easier to measure the quantity rather than the quality of data. Quality depends on which questions you ask. Data quality is crucial, and this is where AI and data cleaning comes in to aid the process.
As we make our way further into the data science space, there are countless new discoveries and challenges that arise. The solutions aren’t certain, but what is certain is that enterprises must use data science to stay relevant.
DisrupTV is a weekly Web series with hosts R “Ray” Wang and Vala Afshar. The show airs live at 11:00 a.m. PT/ 2:00 p.m. ET every Friday.