I’m trying to come up with some interesting large data sets for use in extra-credit work using predictive analytics, especially something that has interactions and/or need for dummy variables. Anyone out there point me in the right direction?
It is a bit disheartening to know that the predictive analytics/big data world is moving ahead so quickly just as I am trying to learn how to harness it. But it is also nice to find the many open source data analysis tools – user friendly types – becoming commoditized on the Web of Things. Two came to my attention today: First, a text analysis tool by the data folks at Stanford at http://www.etcml.com/ that among other things can analyze your (or anyone’s) Tweeter feed for positive or negative “feeling.” Second is a neat network analysis/mapping add-in for Excel http://nodexl.codeplex.com/ that I’m going to use to map some learning communities I’m interested in. You should check out their look at politics on Twitter on their website.
I respectfully suggest you “play” with them. 🙂
http://graphtv.kevinformatics.com/tt0903747 Breaking Bad regressions
I work very hard to get my MBA quantitative students to understand the relevance of the statistical underpinnings of data analytics. Since many of the students are very early in their MBA programs and some have not had basic statistics, my discussions of how Big Data Analytics will become very important to most businesses often bounces off their overworked, Teflon minds. How then do you get the average business person to buy into big data?
“Business people tend to have a high-level understanding of big data, but it’s not always easy for them to get a handle on how to take such huge amounts of information and leverage it to their advantage, according to Boyd Davis, vice president and general manager of Intel’s Datacenter Software Division. But what a lot of people do understand is sports, and the way statistics can be used in developing insights into how a game is going to go.”
Teaming with big data solutions company Kaggle, Intel hopes that at least some of the millions of rabid March Madness fans will turn to a new approach to filling out their brackets. The March Machine Learning Mania competition is open to all comers and runs through March 19th. Currently there are 150 teams entered in the first stage of the competition, which is using the results of last five years’ tournaments to build predictive models. The first stage is optional (but obvious to data scientists important) and began in January. The money ($15k) comes with the winner of the second stage where predictive analytics are used to forecast the outcome of this year’s tournament.
“It is very important for us to use sports and to show them that with the capabilities of … technologies and tools, they can do things in a different way,” Davis said. “We’re using sports as the platform to get people learning about the capabilities of big data.” Kaggle’s (Will) Cukierski said using sports makes sense. “People love to play with sports data, and are usually willing to put up with statistics if it deals with sports.”
Now why didn’t I think of that?
I have enrolled in a MOOC on Coursera, Duke University’s Data Analysis and Statistical Inference. I’m having to learn a new (to me) programming language, R, that is frequently used in data analysis. I had forgotten what learning a coding language was like. It’s fun and is allowing me to build some new synapses, of which I am in dire need 🙂
I clicked once too many times and did it: I linked myself to “gangsta” and I don’t know how to unlink. It all started innocently enough; I was reading my Twitter feed when I noticed one of my friends made an interesting tweet about the Sherman (think Seattle Seahawks v 49ers last Sunday) flap. The tweet pointed out that Sherman was not a stereotypical “dumb” football player; rather he was very intelligent, having graduated from Stanford, which is no slouch of a college. She said he was articulate and could speak “gangsta” as well as “Stanford.” For some reason, that got me thinking about the current sagging pants style adopted by many teens and young men.
I knew there was a name for that style, but I couldn’t place it and wondered if “gangsta” would mean a new, less sagging style would be adopted. So I did my normal Google search for “gangsta” & “belts” (mistake 1 – not setting Google privacy properly) and a link to Amazon popped up top. Before my brain could be engaged, I clicked on the Amazon link and found myself looking at a “Gangsta-style” belt. Then it dawned 🙂 on me that Amazon had opened my account given that “Hello Dawn” was there on the screen. So now I expect to be showered with “gangsta” ads where ever I go on the web. Does anyone know how to scrub my digital footprint? 🙁