Blog 1: Introduction and Module 0

Hi Everyone, 

My name is Julia and I’m very happy to be in the course with all of you. From my brief scanning of your posts I can tell that there is so much I can learn from you and your diverse backgrounds.

This course is my last in the Master of MIS program. I have really enjoyed the MIS program overall and am happy to be closing it out with a Business Intelligence course. I also received my bachelor’s from the U of A, majoring in MIS and Marketing. 

I am a Product Manager in the SaaS Human Capital Management (HCM) space. While I don’t work with Big Data on a regular basis, I am very interested in learning more about how it can be used in my role to inform product decisions and better serve customer needs. Our customers generate a lot of data within our system that we want to put to use in future projects in the realm of machine learning, predictive analytics and natural workspaces. 

Big Data is a really interesting topic to me as I am drawn in by the nearly endless possibilities and benefits. At the same time, I’m very wary of the implications in terms of privacy and data monopolies and prefer to keep a lower profile myself. At times I feel conflicted about the evolving role of Big Data plays in our lives. 

I think that the first week materials did a good job of level setting about not only what Business Intelligence is but why it matters, the trends and challenges as well as how it’s changing the job market. The lecture and readings brought up many points that are top of mind for me in regard to BI. 

One of the most interesting aspects to me is the shift From Causation to Correlation as mentioned (Cukier and Mayer-Schoenberger, 2013). As I mentioned I’m a bit skeptical of Big Data and I believe that this is one of the main risks. I feel that we have collectively acceptable that we are ok with a black box predictive models and not knowing for sure what is driving a prediction. While many positive uses of such a model have been highlighted, particularly in healthcare, I think there are many underlying risk factors at play. Models for use in predictive policing and the court systems have often been found to often have bias against minorities. Biased data that is fed in leads to a biased algorithm and predictions that may go unchecked for years. I recently was researching this topic based on my personal interest a found a great presentation on the topic by Cathy O’Neil. 


If you are really interested, here is a longer version that she presented at Google: https://www.youtube.com/watch?v=TQHs8SA1qpk 

As a proposed solution to this problem, researchers at Wharton want to prevent any protected classes from being used in predictive models in the justice system (Johndrow, 2019). I think that when we apply a critical lens to the ways in which we use Big Data in our models we achieve the best results. I’m excited to learn more about how I can use Big Data in my role while always being mindful of the impact and the pitfalls. 

References: 

Cukier, K.N. and Mayer-Schoenberger, V. (2013). ‘The Rise of Big Data: And How It’s Changing the Way We Think About the World’, Foreign Affairs. Available at: D2L Course Materials. 

Johndrow, J. (2019). ‘Removing Human Bias from Predictive Modeling’, Knowledge@Wharton. Available at: https://knowledge.wharton.upenn.edu/article/removing-bias-from-predictive-modeling/. Accessed: September 6, 2020. 

O’Neil, C. (2016). ‘Weapons of Math Destruction’, Talks at Google. Available at: https://www.youtube.com/watch?v=TQHs8SA1qpk. Accessed: September 6, 2020. 

O’Neil, C. (2017). ‘The Era of Blind Faith in Big Data Must End, Ted Talks. Available at: https://www.ted.com/talks/cathy_o_neil_the_era_of_blind_faith_in_big_data_must_end/transcript?language=en. Accessed: September 6, 2020.

Comments

  1. Hi Kathy, We are a diverse group indeed...I'm also looking forward to learning from you and others!

    I concur with you...there is need for decision makers to also look beyond the data to drive decision-making. Algorithms are designed, hence it is of paramount importance to understand source and other factors. To Cathy O'Neal's point on an Algorithmic audit, "Algorithms must be interrogated...and they must tell us the truth everytime..." O'neil suggested:
    1. Data integrity check
    2. We should think about the definition of success
    3. Consider accuracy...no algorithm is perfect, who does the model fail etc.
    4. Long term effects of algorithms, to include feedback loops.

    Thanks.

    ReplyDelete
  2. Hi Julia,

    Thank you for reminding us to be skeptical of Big Data and I think it is especially important in the current climate of US race relations. Your post made me think: How has data helped alleviate social problems like racial inequity? Has it even helped at all? Is it causing more problems?

    For instance, lets think about positive stereotypes such as Asian Americans are good at math and African Americans are good at sports. There is data that supports these ideas and such they become positive stereotypes. However, is the data doing a good thing putting in these categories? Even positive stereotypes are damaging to marginalized populations so to think in such a way means that we are not doing our due diligence to understand how data works in the bigger picture. Similar to one of our readings, humans are still needed in data for our innovation, creativity, and judgement.

    Thanks for sharing!

    Best,
    Dustin Natte

    ReplyDelete
  3. Hi Everyone,

    Thank you for the comments to my post. I can see that many of us share the same concerns about privacy and our rights around the ways our data is used. Dustin makes a good point about positive stereotypes that have been created by categorizing us, often by protected classes. While this on the surface is not necessarily bad, I think any type of segmenting in this way can be harmful.

    I'm skeptical by nature and my feeling is that data tends to further segregate us and widen existing gaps. Looking into this further a found an interested article from Sabina Leonelli, a Professor of Philosophy and History of Science at the University of Exeter. In her research she has found that there are several primary areas of concern in regards to today including how data is shared and dispersed and if it is credible. She also finds that "The vast majority of large research databases display “tractable” data produced by rich, English-speaking groups, with very little representation from less visible and more vulnerable groups" (Leonelli, 2018).

    If this is the case with big data then the analysis we conduct on the data would be very skewed. While Leonelli's research isn't specific to Big Data, I think it's easy to see that certain groups are more likely to generate the data discussed in this course via smartphone use and social networks and some groups are therefore under represented. I think this is definitely food for thought as we progress through this course.

    Leonelli, S. (2018). Without urgent action big data may widen inequality. LSE Business Review. Available at: https://blogs.lse.ac.uk/businessreview/2018/02/24/without-urgent-action-big-data-may-widen-inequality/. Accessed September 13, 2020.

    ReplyDelete
  4. Hi Julia,

    I share your concern with the black-box decision making. I think we're already very deep into a culture of blind data following and because of that, we're already facing multiple challenges. I was watching a political TV show the other day about local jury duties and was shocked to learn that due to a data error that an entire city was excluded from Jury Duty for multiple multiple years. We have a long way to go before we're going to be able to be able to unquestionably rely on the systems that are often created and rushed by for profit corporations.

    ReplyDelete
  5. Hi Julia, T.R., Dustin and Todd,

    Thanks for sharing web sources on this topic. Cathy O'Neil is also featured in the Netflix documentary "The Social Dilemma," in case you haven’t seen it yet (https://www.netflix.com/title/81254224).

    You may know the common saying, “Your model is just as good as your data” (with variations in the "model" part). Going from there, I’d add that sampling design is going to affect the quality of the dataset. I think there are still human communities that are only partially represented in some business sectors or in the social media / datasphere (specially, in geographical regions or social sectors where IT is less accessible by people). In those cases, sampling may still be a necessary strategy to model the characteristics of different groups at the population level. When one uses group data and the number of data units is proportional to the size of the group in the population, or when one collects data units based on universal random sampling, small groups may get underrepresented or represented in a small portion. In this case, a classifier could not have sufficient information about the small group to reliably learn its characteristics, variations within the group, and differences with other groups. In consequence, the classifier might end up merging different classes and/or fail to identify the characteristics of the smaller group. An idea could be to collect group data in ways that prevent any group from contributing less volume of data than other groups, e.g. using stratified sampling with equal allocation of observations per group (so the same number of data units is taken from each group, irrespective of the size of the group). Even if a massive volume of data was available, overrepresented groups would be “thinned” to allow room for information about underrepresented groups. The following books are old but still useful for general background on sampling:

    Des Raj (1968). Sampling Theory. Mc-Graw-Hill Book Company. New York.

    Des Raj (1972). The Design of Sample Surveys. McGraw-Hill Book Company, New York.


    ReplyDelete

Post a Comment