Skip to main content

Posts

An ambitious experiment in Data Science takes off: a biased, Open Source view from Berkeley

Today, during a White House OSTP event combining government, academia and industry, the Gordon and Betty Moore Foundation and the Alfred P. Sloan Foundation announced a $37.8M funding commitment to build new data science environments. This caps a year's worth of hard work for us at Berkeley, and even more for the Moore and Sloan teams, led by Vicki Chandler , Chris Mentzel and Josh Greenberg : they ran a very thorough selection process to choose three universities to participate in this effort. The Berkeley team was led by Saul Perlmutter , and we are now thrilled to join forces with teams at the University of Washington and NYU, respectively led by Ed Lazowska and Yann LeCun . We have worked very hard on this in private, so it's great to finally be able to publicly discuss what this ambitious effort is all about. Most of the UC Berkeley BIDS team, from left to right: Josh Bloom, Cathryn Carson, Jas Sekhon, Saul Perlmutter, Erik Mitchell, Kimmen Sjölander, Jim Sethia...
Recent posts

In Memoriam, John D. Hunter III: 1968-2012

I just returned from the SciPy 2013 conference, whose organizers kindly invited me to deliver a keynote . For me this was a particularly difficult, yet meaningful edition of SciPy, my favorite conference. It was only a year ago that John Hunter, creator of matplotlib , had delivered his keynote shortly before being diagnosed with terminal colon cancer, from which he passed away on August 28, 2012 (if you haven't seen his talk, I strongly recommend it for its insights into scientific open source work). On October 1st 2012, a memorial service was held at the University of Chicago's Rockefeller Chapel, the location of his PhD graduation. On that occasion I read a brief eulogy, but for obvious reasons only a few members from the SciPy community were able to attend. At this year's SciPy conference, Michael Droetboom (the new project leader for matplotlib) organized the first edition of the John Hunter Excellence in Plotting Contest , and before the awards ceremony I read ...

Exploring Open Data with Pandas and IPython at the Berkeley I School

"Working with Open Data", a course by Raymond Yee This will be a guest post, authored by Raymond Yee from the UC Berkeley School of Information (or I School, as it is known around here). This spring, Raymond has been teaching a course titled "Working with Open Data" , where students learn how to work with openly available data sets with Python. Raymond has been using IPython and the notebook since the start of the course, as well as hosting lots of materials directly using github. He kindly invited me to lecture in his course a few weeks ago, and I gave his students an overview of the IPython project as well as our vision of reproducible research and of building narratives that are anchored in code and data that are always available for inspection, discussion and further modification. Towards the end of the course, his students had to develop a final project, organizing themselves in groups of 2-4 and producing a final working system that would use open...

"Literate computing" and computational reproducibility: IPython in the age of data-driven journalism

As "software eats the world" and we become awash in the flood of quantitative information denoted by the "Big Data" buzzword, it's clear that informed debate in society will increasingly depend on our ability to communicate information that is based on data. And for this communication to be a truly effective dialog , it is necessary that the arguments made based on data can be deconstructed, analyzed, rebutted or expanded by others. Since these arguments in practice often rely critically on the execution of code (whether an Excel spreadsheet or a proper program), it means that we really need tools to effectively communicate narratives that combine code, data and the interpretation of the results. I will point out here two recent examples, taken from events in the news this week, where IPython has helped this kind of discussion, in the hopes that it can motivate a more informed style of debate where all the moving parts of a quantitative argument are avail...

Back from PyCon Canada 2012

I just got back a few days ago from the 2012 edition of PyCon Canada , which was a great success. I wanted to thank the team who invited me for a fantastic experience: Diana Clarke who as conference chair did an incredible job, Greg Wilson from Software Carpentry with whom I had a chance to interact a lot (he already has a long list of ideas for the IPython notebook in teaching contexts we're discussing), Mike DiBernardo and the rest of the PyConCa team. They ran a conference with a great vibe and tons of opportunity for engaging discussion. Thanks to Greg I also had a chance to give a couple of more academically-oriented talks at U. Toronto facilities, both at the Sunnybrook hospital and their SciNet HPC center, where we had some great discussions. I look forward to future collaborations with some of the folks there. The PyConCa kindly invited me to deliver the closing keynote for the conference, and I tried to provide a presentation on the part of the Python world that I...

Help save open space in the Bay Area by protecting Knowland Park from development

Vote NO on new Tax Measure A1 Update:  there is now evidence that Zoo officials have actually violated election laws  in their zeal to promote measure A1. I normally only blog about technical topics, but the destruction of a beautiful piece of open space in the Bay Area is imminent, and I want to at least do a little bit to help prevent this disaster. In short: there's a tax measure on the November ballot, Measure A1 , that would impose a parcel tax on all residences and businesses in Alameda County to fund the Oakland Zoo for the next 25 years .  The way the short text on the ballot is worded makes it appear as something geared towards animal care for a cash-strapped Zoo.  The sad reality is that the full text of the measure allows the Zoo to use these funds for a very controversial expansion plan that includes a 34,000 sq. ft. visitor center, gift shop and restaurant serviced by a ski gondola atop one of the last pristine remaining ridges in Knowland Park, ...

Blogging with the IPython notebook

Update (May 2014): Please note that these instructions are outdated. while it is still possible (and in fact easier) to blog with the Notebook, the exact process has changed now that IPython has an official conversion framework. However, Blogger isn't the ideal platform for that (though it can be made to work). If you are interested in using the Notebook as a tool for technical blogging, I recommend looking at Jake van der Plas' Pelican support or Damián Avila's support in Nikola . Update: made full github repo for blog-as-notebooks, and updated instructions on how to more easily configure everything and use the newest nbconvert for a more streamlined workflow. Since the notebook was introduced with IPython 0.12 , it has proved to be very popular, and we are seeing great adoption of the tool and the underlying file format in research and education. One persistent question we've had since the beginning (even prior to its official release) was whether it would...