Skip to main content

Austin trip: IPython at TACC and DataArray summit at Enthought

TACC Software Days

I recently had the chance to speak at the UT Austin Texas Advanced Computing Center, during their Fifth Annual Scientific Software Day thanks to a kind invitation by Sergey Fomel and Victor Eijkhout.  Since the audience wasn't specifically composed of Python users, I gave a general introduction to Python's role in scientific computing, but then spent most of my time presenting some of the recent work we've been doing on IPython, extending the model of basic interactive computing in what I think are interesting directions with multiple client models and new parallel computing interfaces. Sergey had asked me to provide a somewhat personal account, so the presentation is fairly biased towards my own path (and therefore IPython) through the scipy effort.  Since the foucs of TACC is high-performance computing, I hope some of our new functionality on the IPython front will be useful to such users.

There were some very interesting presentations about the FLAME linear algebra library. I did my best to convince Robert van de Geijn of the interest there would be in the scientific Python community for exposing FLAME to Python, possibly as an alternative backend for the linear algebra machinery in scipy.  Since FLAME uses a fair amount of code generation, I think the dynamic properties of Python would make it a great fit for the FLAME paradigm.  We had some interesting discussions on this, we'll see where this develops...

Datarray summit at Enthought

In conjunction with this visit, we had been trying to organize a meeting at Enthought to make some progress on the datarray effort that was started after last year's scipy conference.  I'd like to thank Sergey for kindly allowing my UT-funded visit to extend into the datarray summit.  Enthought brought a large contingent of their in-house team (developers and interns like Mark Wiebe of numpy fame), and invited Wes Mc Kinney (pandas) and Matthew Brett (nipy) to participate (as well as others who couldn't make it).  In all we had roughly 15 people, with Travis Oliphant, Eric Jones, Robert Kern, Peter Wang and Corran Webster --all very experienced numpy users/developers-- participating for most of the meeting, which was both a lot of fun and very productive.  The datarray effort has continued to progress but fairly slowly, mostly due to my being timesliced into oblivion.  But I remain convinced it's a really important piece of the puzzle for scientific python to really push hard into high-level data analysis, and I'm thrilled that Enthought is putting serious resources into this.

We spent a lot of time on the first day going over use cases coming from many different fields, and then dove into the API design questions, using the current code in the datarray repository as a starting point.  It has become very clear that one key piece of functionality we can't ignore is allowing for axis labels to be integers (without any assumption of ordering, continuity or monotonicity). This proves to be surprisingly tricky to accomodate, because the assumption that integers are indices for array ranging from 0 to n-1 in any dimension goes very deep in all slicing operations, so allowing for integers with different semantics requires a fair amount of API gymnastics.

Not everything is fully settled (and I missed some of the details because I could not stay until the very end), but it's clear that we will have integer labels, and that the main entry point for all axis slicing will be a .axes attribute. This will likely provide the basic named axis attribute-based access, as well as dictionary-style access to computed axes.  We will also probably have one method ('slice' was the running name when I left) for slicing with more elaborate semantics, so that we can resolve the ambiguities of integer labeling without resorting to obscure magic overloads (numpy already has enough of those with things like ogrid[10:20:3j], where that "complex slice step" is always fun to explain to newcomers).

We just completed an episode of the inSCIght podcast that contains some more discussion about this with Travis and Wes.  I really hope we won't lose the momentum we seem to have picked up and that over the next few months this will start taking shape into production code.  I know we need it, badly.


Comments

Andy R. Terrel said…
Glad to hear you are on van de Geijn's case, I've been telling him this for a couple of years now. The real problem is licensing (libflame is LGPL) and making sure van de Geijn doesn't have to support the software.

I would say wrapping up the current libflame with cython would be no trouble at all. Ignition has a interface for defining the operations and generating the algorithms but not the low-level implementation.
Fernando Perez said…
Hey @Andy, I would go as far as saying I'm really on his case, but I certainly think it's the right way to go. LGPL licensing isn't that big of a deal, there's a fair amount of LGPL dependencies in the scipy stack (WX, Qt), so I don't think that would worry anyone too much.

Maybe you can take that project up in your copious spare time ;)
Gaël said…
Hi Fernando,

I am really that you are still putting some energy in DataArrays. I think as you do that they are a very important piece of the puzzle.
chuck said…
Besides DataArrays, I think better low level support for masked arrays would be very useful. There was some demo code posted, oh, maybe 2 years ago that added support for them in at the ufunc level. I was sorry that it didn't go any further than that.
Unknown said…
@Fernando, nice blog post.

Btw, I don't use Wx nor Qt, just a terminal and Chrome/Firefox. And it seems to have (or will have very soon) everything I need for scientific computing.

Popular posts from this blog

Blogging with the IPython notebook

Update (May 2014): Please note that these instructions are outdated. while it is still possible (and in fact easier) to blog with the Notebook, the exact process has changed now that IPython has an official conversion framework. However, Blogger isn't the ideal platform for that (though it can be made to work). If you are interested in using the Notebook as a tool for technical blogging, I recommend looking at Jake van der Plas' Pelican support or Damián Avila's support in Nikola . Update: made full github repo for blog-as-notebooks, and updated instructions on how to more easily configure everything and use the newest nbconvert for a more streamlined workflow. Since the notebook was introduced with IPython 0.12 , it has proved to be very popular, and we are seeing great adoption of the tool and the underlying file format in research and education. One persistent question we've had since the beginning (even prior to its official release) was whether it would...

The IPython notebook: a historical retrospective

On December 21 2011, we released IPython 0.12 after an intense 4 1/2 months of development.  Along with a number of new features and bug fixes, the main highlight of this release is our new browser-based interactive notebook : an environment that retains all the features of the familiar console-based IPython but provides a cell-based execution workflow and can contain not only code but any element a modern browser can display.  This means you can create interactive computational documents that contain explanatory text (including LaTeX equations rendered in-browser via MathJax), results of computations, figures, video and more.  These documents are stored in a version-control-friendly JSON format that is easy to export as a pure Python script, reStructuredText, LaTeX or HTML. For the IPython project this was a major milestone, as we had wanted for years to have such a system, and it has generated a fair amount of interest online. In particular, on our mailing list a us...

An ambitious experiment in Data Science takes off: a biased, Open Source view from Berkeley

Today, during a White House OSTP event combining government, academia and industry, the Gordon and Betty Moore Foundation and the Alfred P. Sloan Foundation announced a $37.8M funding commitment to build new data science environments. This caps a year's worth of hard work for us at Berkeley, and even more for the Moore and Sloan teams, led by Vicki Chandler , Chris Mentzel and Josh Greenberg : they ran a very thorough selection process to choose three universities to participate in this effort. The Berkeley team was led by Saul Perlmutter , and we are now thrilled to join forces with teams at the University of Washington and NYU, respectively led by Ed Lazowska and Yann LeCun . We have worked very hard on this in private, so it's great to finally be able to publicly discuss what this ambitious effort is all about. Most of the UC Berkeley BIDS team, from left to right: Josh Bloom, Cathryn Carson, Jas Sekhon, Saul Perlmutter, Erik Mitchell, Kimmen Sjölander, Jim Sethia...