Skip to main content

Posts

Showing posts from 2013

An ambitious experiment in Data Science takes off: a biased, Open Source view from Berkeley

Today, during a White House OSTP event combining government, academia and industry, the Gordon and Betty Moore Foundation and the Alfred P. Sloan Foundation announced a $37.8M funding commitment to build new data science environments. This caps a year's worth of hard work for us at Berkeley, and even more for the Moore and Sloan teams, led by Vicki Chandler , Chris Mentzel and Josh Greenberg : they ran a very thorough selection process to choose three universities to participate in this effort. The Berkeley team was led by Saul Perlmutter , and we are now thrilled to join forces with teams at the University of Washington and NYU, respectively led by Ed Lazowska and Yann LeCun . We have worked very hard on this in private, so it's great to finally be able to publicly discuss what this ambitious effort is all about. Most of the UC Berkeley BIDS team, from left to right: Josh Bloom, Cathryn Carson, Jas Sekhon, Saul Perlmutter, Erik Mitchell, Kimmen Sjölander, Jim Sethia...

In Memoriam, John D. Hunter III: 1968-2012

I just returned from the SciPy 2013 conference, whose organizers kindly invited me to deliver a keynote . For me this was a particularly difficult, yet meaningful edition of SciPy, my favorite conference. It was only a year ago that John Hunter, creator of matplotlib , had delivered his keynote shortly before being diagnosed with terminal colon cancer, from which he passed away on August 28, 2012 (if you haven't seen his talk, I strongly recommend it for its insights into scientific open source work). On October 1st 2012, a memorial service was held at the University of Chicago's Rockefeller Chapel, the location of his PhD graduation. On that occasion I read a brief eulogy, but for obvious reasons only a few members from the SciPy community were able to attend. At this year's SciPy conference, Michael Droetboom (the new project leader for matplotlib) organized the first edition of the John Hunter Excellence in Plotting Contest , and before the awards ceremony I read ...

Exploring Open Data with Pandas and IPython at the Berkeley I School

"Working with Open Data", a course by Raymond Yee This will be a guest post, authored by Raymond Yee from the UC Berkeley School of Information (or I School, as it is known around here). This spring, Raymond has been teaching a course titled "Working with Open Data" , where students learn how to work with openly available data sets with Python. Raymond has been using IPython and the notebook since the start of the course, as well as hosting lots of materials directly using github. He kindly invited me to lecture in his course a few weeks ago, and I gave his students an overview of the IPython project as well as our vision of reproducible research and of building narratives that are anchored in code and data that are always available for inspection, discussion and further modification. Towards the end of the course, his students had to develop a final project, organizing themselves in groups of 2-4 and producing a final working system that would use open...

"Literate computing" and computational reproducibility: IPython in the age of data-driven journalism

As "software eats the world" and we become awash in the flood of quantitative information denoted by the "Big Data" buzzword, it's clear that informed debate in society will increasingly depend on our ability to communicate information that is based on data. And for this communication to be a truly effective dialog , it is necessary that the arguments made based on data can be deconstructed, analyzed, rebutted or expanded by others. Since these arguments in practice often rely critically on the execution of code (whether an Excel spreadsheet or a proper program), it means that we really need tools to effectively communicate narratives that combine code, data and the interpretation of the results. I will point out here two recent examples, taken from events in the news this week, where IPython has helped this kind of discussion, in the hopes that it can motivate a more informed style of debate where all the moving parts of a quantitative argument are avail...