Playing with PyPy
Posted by: Bob Ippolito on
so. The concept behind it is fascinating; it's a Python interpreter written
in (a subset of) Python. It's actually a lot more than that because the
language front-ends (e.g. Python) are quite separate from the backends
(e.g. C, JVM, CLI, Python). This makes it a unique platform for language
research because coding Python is typically easier than C, and so much
of the work is already done for you.</p>
<p>It's clear that PyPy is very useful for academic research, but it's also
quickly becoming a practical target for developing and deploying Python
code. At the <a class="reference external" href="http://speed.pypy.org/">PyPy Speed Center</a> you can see that it's already several
times faster than CPython, and has the potential to fix most of the more
fundamental flaws of the CPython VM.</p>
<p>What's awesome right now:</p>
<ul class="simple">
<li>PyPy has a modern garbage collector, not ref counting</li>
<li>PyPy's JIT can run string mangling and numerics code
very quickly, which removes the need for most C extensions</li>
<li>PyPy is already fast, and is getting faster all the time</li>
</ul>
<p>How it can be more awesome (just my opinion, I don't speak for the PyPy
team and their implementation goals):</p>
<ul class="simple">
<li>PyPy has an alpha quality <a class="reference external" href="http://pypy.org/compat.html">cpyext</a> that will allow you to use CPython
extensions (requires a recompile), and when that's polished it will be very
easy for CPython users to migrate en masse, even though they may have
complicated dependencies such as NumPy, SciPy, PIL, etc.</li>
<li>PyPy has the potential to eventually remove the GIL, and/or
have multiple VMs in the same OS process</li>
<li>PyPy could add M:N threading and concurrency constructs to the
language (some stackless support already exists, but is not
currently compatible with the JIT and doesn't take advantage
of multiple cores)</li>
<li>PyPy could simultaneously support Python 2.x and 3.x code in the
same process, making it practical to actually make the transition
(note: this is a crazy idea that would be terribly difficult)</li>
</ul>
<div class="section" id="playing-with-pypy">
<h2>Playing with PyPy</h2>
<p>I've been working on helping the PyPy team with some real world benchmarks
for JSON, and helping sort out Mac OS X issues. I've also been tuning a
branch of <a class="reference external" href="http://simplejson.github.com/simplejson/">simplejson</a> to run efficiently on PyPy. I'll write more about
this in a follow up post, but here's how you can get started.</p>
<p>If you're a library author or an advanced user you should be experimenting
with PyPy right now. In these instructions we'll install PyPy in <tt class="docutils literal">~/opt</tt>
and create a virtualenv for it in <tt class="docutils literal">~/virtualenv</tt>.</p>
<div class="section" id="prerequisites">
<h3>Prerequisites</h3>
<p>Install <a class="reference external" href="http://mercurial.selenic.com/">Mercurial</a> 1.7 or newer.</p>
<p>Install Xcode 3.2.x (gcc-4.0 is currently required for building PyPy).</p>
<p>Download virtualenv 1.5.2 (or later):</p>
<pre class="literal-block">
mkdir -p ~/src
(cd ~/src; curl -s http://pypi.python.org/packages/source/v/virtualenv/virtualenv-1.5.2.tar.gz | tar zxf -)
</pre>
<p>Clone jitviewer and pypy:</p>
<pre class="literal-block">
# Make a ~/src to store all of these clones
mkdir -p ~/src
# Get the PyPy jitviewer application (install to ~/src/jitviewer)
(cd ~/src; hg clone https://bitbucket.org/pypy/jitviewer)
# Clone PyPy source, this will take a while
(cd ~/src; hg clone https://bitbucket.org/pypy/pypy)
</pre>
</div>
<div class="section" id="installing-from-binary">
<h3>Installing from binary</h3>
<p>Follow the link on <a class="reference external" href="http://pypy.org/download.html">http://pypy.org/download.html</a> to download one of the
Mac OS X binaries (either 32-bit or 64-bit), the current version at this
time is 1.4.1.</p>
<p>This will install PyPy to <tt class="docutils literal">~/opt</tt> and create a virtualenv:</p>
<pre class="literal-block">
# Unpack PyPy and "install" to ``~/opt``
mkdir -p ~/opt
cd ~/opt
tar jxvf ~/Downloads/pypy-1.4.1-osx64.tar.bz2
# Install virtualenv to this PyPy build, and create a new virtualenv.
# I also like to create a symlink ``~/virtualenv/pypy-env`` to the
# version I am currently working with::
PKG=pypy-1.4.1-osx64
PYPY=~/opt/$PKG/bin/pypy
# install virtualenv
(cd ~/src/virtualenv-1.5.2; $PYPY setup.py install)
# create virtualenv
mkdir -p ~/virtualenv
rm -rf ~/virtualenv/$PKG
~/opt/$PKG/bin/virtualenv --distribute ~/virtualenv/$PKG
# update symlink
(cd ~/virtualenv; rm -f pypy-env; ln -s $PKG pypy-env)
# install jitviewer
(source ~/virtualenv/pypy-env/bin/activate; \
pip install flask pygments simplejson; \
cd ~/src/jitviewer; pypy setup.py develop )
</pre>
<p>When you want to use PyPy, just activate the virtualenv:</p>
<pre class="literal-block">
source ~/virtualenv/pypy-env/bin/activate
# now you can use PyPy! both "python" and "pypy" will work
</pre>
</div>
<div class="section" id="tuning-for-pypy-1-4-1">
<h3>Tuning for PyPy 1.4.1</h3>
<p>On Mac OS X, PyPy 1.4.1 (and current default) does not choose optimal tuning
values for the GC. You will get ~30% better performance by setting this
environment variable:</p>
<pre class="literal-block">
export PYPY_GC_NURSERY=1M
</pre>
<p>Note that 1M is a machine specific value, so if your Mac isn't the same model
as mine there might be a better default for you. The value is very likely to
be dependent on the amount of L2/L3 cache and how
many physical cores you have, and you can get those values from sysctl:</p>
<pre class="literal-block">
$ sysctl hw.l3cachesize hw.l2cachesize hw.physicalcpu
hw.l3cachesize: 4194304
hw.l2cachesize: 262144
hw.physicalcpu: 2
</pre>
<p>From the pypy source directory, with a pypy virtualenv activated, you can run
this script to see what a good value might be (lowest time is best):</p>
<pre class="literal-block">
#!/bin/bash
for ((procs=1; procs <= 4 ; procs++)); do
for ram in 128K 256K 512K 768K 1M 2M 3M 4M; do
echo "export PYPY_GC_NURSERY=$ram # procs=$procs"
export PYPY_GC_NURSERY=$ram
for ((p=1; p <= $procs; p++)); do
(cd pypy/translator/goal; pypy gcbench.py | grep 'Completed in') &
done
wait
done
done
</pre>
<p>The PyPy team is very interested in knowing what the sysctl values are for your
machine and the output of the GC benchmark, so if you get this far please send it
along to me or the PyPy mailing list! Having output from many different models of
Mac will help us come up with a better algorithm for choosing sane defaults.</p>
</div>
<div class="section" id="building-pypy-from-source">
<h3>Building PyPy from source</h3>
<p>Make sure to install a binary first. Since translating PyPy is CPU bound,
this runs a lot faster if you use PyPy.</p>
<p>These commands will build PyPy, create a release based on the hg revision,
update <tt class="docutils literal"><span class="pre">~/virtualenv/pypy-env</span></tt>, etc.:</p>
<pre class="literal-block">
# Translate PyPy (expect this a while, at least an hour for me)
(cd pypy/translator/goal; pypy translate.py -Ojit)
# Build the release
BRANCH=$(hg branch)
PKG=pypy-$(hg branches|grep "^$BRANCH " | cut -d: -f2)-osx64
mkdir -p ~/opt
(cd pypy/tool/release; /usr/bin/python package.py ../../.. $PKG)
rm -rf ~/opt/$PKG
mv $TMPDIR/usession-$BRANCH-$USER/build/$PKG ~/opt/$PKG
# install virtualenv
PYPY=~/opt/$PKG/bin/pypy
(cd ~/src/virtualenv-1.5.2; $PYPY setup.py install)
# create virtualenv
mkdir -p ~/virtualenv
rm -rf ~/virtualenv/$PKG
~/opt/$PKG/bin/virtualenv --distribute ~/virtualenv/$PKG
# make default
(cd ~/virtualenv; rm -f pypy-env; ln -s $PKG pypy-env)
# install jitviewer
(source ~/virtualenv/pypy-env/bin/activate; \
pip install flask pygments simplejson; \
cd ~/src/jitviewer; pypy setup.py develop )
</pre>
</div>
<div class="section" id="running-jitviewer">
<h3>Running jitviewer</h3>
<p>jitviewer is an awesome web app for reading PyPy logs, it will help you
optimize your code for PyPy (once you have a basic understanding of the
output, which is beyond the scope of this post).</p>
<p>Run your code with JIT logging turned on:</p>
<pre class="literal-block">
# log to pypy-jit.log
PYPYLOG=jit-log-opt,jit-backend-counts:pypy-jit.log pypy benchmark.py
# start the jitviewer server with pypy-jit.log
PYTHONPATH=~/src/pypy jitviewer.py pypy-jit.log
</pre>
<p>After jitviewer is started, open a web browser to <a class="reference external" href="http://127.0.0.1:5000/">http://127.0.0.1:5000/</a></p>
</div>
</div>
Bob Ippolito's complete blog can be found at: http://bob.pythonmac.org/
Why Attend the NFJS Tour?
- » Cutting-Edge Technologies
- » Agile Practices
- » Peer Exchange
Current Topics:
- Languages on the JVM: Scala, Groovy, Clojure
- Enterprise Java
- Core Java, Java 8
- Agility
- Testing: Geb, Spock, Easyb
- REST
- NoSQL: MongoDB, Cassandra
- Hadoop
- Spring 4
- Cloud
- Automation Tools: Gradle, Git, Jenkins, Sonar
- HTML5, CSS3, AngularJS, jQuery, Usability
- Mobile Apps - iPhone and Android
- More...