I recently predicted grocery sales for a kaggle competition. In this competition, we were responsible for using data from six tables to predict how many units of different items would sell on future dates. This competitions presented several challenges, including merging multiple tables, working with a data frame that was larger than RAM, and working with categorical variables that had many classes. This is part one, where I discuss how I dealt with the large data frame. I will discuss my handling of categorical variables with h2o in part 2. I will update this post with a link when part 2 is available.
Continue Reading
If you have a twitter feed like mine (i.e., nerdy) you can hardly go a day without seeing some mention of “deep learning.” In fact a quick glance at google anayltics shows that searches for deep learning have been rising over the past 5 years. I included “linear regression” to have a point of comparison. (You’ll note the famous “people search for this more when school is in session” trend associated with linear regression.)
Continue Reading
Everyone knows that matlab is terrible, and I never want to use it again once I get out of this rattrap. But in order to do some serious data work in the serious world, you need to use a combination of Python and SQL. On the third hand, I couldn’t just throw my grad school life away and break free (I tried that, and it didn’t work).
Continue Reading
We are a collection of Psychology and Neuroscience graduate students from UC Davis who are interested in data science, user experience, and local beer. Our shared goal is to help each other prepare for a life (i.e. job) outside of academia, or perhaps, take a more modern approach to a life inside. You can read the latest blog post to the left, find older posts in the archive, or check out some of our projects.