Small Python container for Docker

Intention

I was running Docker on a minimal installation of Ubuntu Server LTS (22.04.1 at that time), and needed a minimal container with Python 3 and PyMySQL in order to access a MySQL database. Thus, I created a fresh Dockerfile in an elsewise empty direcory named alpinepython. As usually, I literally started from Alpine Linux:

FROM alpine:latest
RUN apk add --no-cache python3 py-pymysql
ENTRYPOINT ["/usr/bin/python3"]

From inside the directory alpinepython, I built the image:

$ sudo docker build -t alpinepython:1 .

docker images revealed that it is quite a large image for my taste:

$ sudo docker images
REPOSITORY      TAG       IMAGE ID       CREATED          SIZE
alpinepython    1         011acaf505a2   3 hours ago      53.6MB
alpine          latest    9c6f07244728   2 months ago     5.54MB

I created a container based on that image, and examined its contents:

$ sudo docker create --name test alpinepython:1
$ sudo docker export test | tar tvf -

Besides Python, that container consisted of several shared libraries, some BusyBox command line tools, and the Alpine Linux Package Keeper (apk). I had no need for all these tools in order to run a Python script in a container. Having quite some experiences with cross compiling Linux and building a Linux system with no shared libraries back in the old days, I followed the official howto build a statically linked Python, and thus was able to create an image which only contains the Python runtime and PyMySQL.

Building a minimal Python image

  1. Install a compiler and some development tools on your local (Ubuntu) Linux host:
    $ sudo apt install build-essential
    
  2. As we are linking Python statically, we need to install the development packages of some libraries (read: For this, we need the *.a archives, not the *.so shared libraries):
    $ sudo apt install zlib1g-dev libssl-dev libb2-dev libbz2-dev liblzma-dev libgdbm-dev libsqlite3-dev
    
  3. Create two subdirectories inside the new folder scratchpython. We will build Python and PyMySQL in src, while img becomes the root directory for our images:
    $ mkdir -p scratchpython/{img,src}
    $ cd scratchpython/src
    
  4. Fetch all required source packages and save them to the src directory:
    $ curl -O https://www.python.org/ftp/python/3.11.0/Python-3.11.0.tar.xz
    $ curl -O https://files.pythonhosted.org/packages/c5/41/247814d8b7a044717164c74080725a6c8f3d2b5fc82b34bd825b617df663/setuptools-65.5.0.tar.gz
    $ curl -O https://files.pythonhosted.org/packages/60/ea/33b8430115d9b617b713959b21dfd5db1df77425e38efea08d121e83b712/PyMySQL-1.0.2.tar.gz
    
  5. Extract the Python source code and change to that directory:
    $ tar xf Python-3.11.0.tar.xz
    $ cd Python-3.11.0
    
  6. We have to build the Python standard libraries as static modules. Based on the configuration file Modules/Setup.stdlib, we create Modules/Setup.local, but will not build the nis, ossaudiodev, nor any testing modules:
    $ echo '*static*
    
    # Modules that should always be present (POSIX and Windows):
    array arraymodule.c
    _asyncio _asynciomodule.c
    _bisect _bisectmodule.c
    _contextvars _contextvarsmodule.c
    _csv _csv.c
    _heapq _heapqmodule.c
    _json _json.c
    _lsprof _lsprof.c rotatingtree.c
    _opcode _opcode.c
    _pickle _pickle.c
    _queue _queuemodule.c
    _random _randommodule.c
    _struct _struct.c
    _typing _typingmodule.c
    _xxsubinterpreters _xxsubinterpretersmodule.c
    _zoneinfo _zoneinfo.c
    
    # needs libm
    audioop audioop.c
    math mathmodule.c
    cmath cmathmodule.c
    _statistics _statisticsmodule.c
    
    # needs libm and on some platforms librt
    _datetime _datetimemodule.c
    
    # _decimal uses libmpdec
    # either static libmpdec.a from Modules/_decimal/libmpdec or libmpdec.so
    # with ./configure --with-system-libmpdec
    _decimal _decimal/_decimal.c
    
    # compression libs and binascii (optional CRC32 from zlib)
    # bindings need -lbz2, -lz, or -llzma, respectively
    binascii binascii.c
    _bz2 _bz2module.c
    _lzma _lzmamodule.c
    zlib zlibmodule.c
    
    # dbm/gdbm
    # dbm needs either libndbm, libgdbm_compat, or libdb 5.x
    #@MODULE__DBM_TRUE@_dbm _dbmmodule.c
    # gdbm module needs -lgdbm
    _gdbm _gdbmmodule.c
    
    # needs -lreadline or -leditline, sometimes termcap, termlib, or tinfo
    #@MODULE_READLINE_TRUE@readline readline.c
    
    # hashing builtins, can be disabled with --without-builtin-hashlib-hashes
    _md5 md5module.c
    _sha1 sha1module.c
    _sha256 sha256module.c
    _sha512 sha512module.c
    _sha3 _sha3/sha3module.c
    _blake2 _blake2/blake2module.c _blake2/blake2b_impl.c _blake2/blake2s_impl.c
    
    ############################################################################
    # XML and text
    
    # pyexpat module uses libexpat
    # either static libexpat.a from Modules/expat or libexpat.so with
    # ./configure --with-system-expat
    pyexpat pyexpat.c
    
    # _elementtree libexpat via CAPI hook in pyexpat.
    _elementtree _elementtree.c
    
    _codecs_cn cjkcodecs/_codecs_cn.c
    _codecs_hk cjkcodecs/_codecs_hk.c
    _codecs_iso2022 cjkcodecs/_codecs_iso2022.c
    _codecs_jp cjkcodecs/_codecs_jp.c
    _codecs_kr cjkcodecs/_codecs_kr.c
    _codecs_tw cjkcodecs/_codecs_tw.c
    _multibytecodec cjkcodecs/multibytecodec.c
    unicodedata unicodedata.c
    
    ############################################################################
    # Modules with some UNIX dependencies
    #
    
    # needs -lcrypt on some systems
    _crypt _cryptmodule.c
    fcntl fcntlmodule.c
    grp grpmodule.c
    mmap mmapmodule.c
    # FreeBSD: nis/yp APIs are in libc
    # Linux: glibc has deprecated SUN RPC, APIs are in libnsl and libtirpc (bpo-32521)
    #nis nismodule.c
    # needs sys/soundcard.h or linux/soundcard.h (Linux, FreeBSD)
    #ossaudiodev ossaudiodev.c
    _posixsubprocess _posixsubprocess.c
    resource resource.c
    select selectmodule.c
    _socket socketmodule.c
    # AIX has shadow passwords, but does not provide getspent API
    spwd spwdmodule.c
    syslog syslogmodule.c
    termios termios.c
    
    # multiprocessing
    _posixshmem _multiprocessing/posixshmem.c
    _multiprocessing _multiprocessing/multiprocessing.c _multiprocessing/semaphore.c
    
    
    ############################################################################
    # Modules with third party dependencies
    #
    
    # needs -lffi and -ldl
    _ctypes _ctypes/_ctypes.c _ctypes/callbacks.c _ctypes/callproc.c _ctypes/stgdict.c _ctypes/cfield.c \
    	-l:libffi.a
    
    _sqlite3 _sqlite/blob.c _sqlite/connection.c _sqlite/cursor.c _sqlite/microprotocols.c _sqlite/module.c _sqlite/prepare_protocol.c _sqlite/row.c _sqlite/statement.c _sqlite/util.c \
    	-l:libsqlite3.a
    
    _ssl _ssl.c $(OPENSSL_INCLUDES) $(OPENSSL_LDFLAGS) \
    	-l:libssl.a -Wl,--exclude-libs,libssl.a \
    	-l:libcrypto.a -Wl,--exclude-libs,libcrypto.a
    _hashlib _hashopenssl.c $(OPENSSL_INCLUDES) $(OPENSSL_LDFLAGS) \
    	-l:libcrypto.a -Wl,--exclude-libs,libcrypto.a
    
    # Linux: -luuid, BSD/AIX: libc's uuid_create()
    _uuid _uuidmodule.c -luuid' > Modules/Setup.local
    
  7. Next, we configure the source tree. You might omit the --enable-optimizations switch:
    $ ./configure LDFLAGS="-static" --prefix=/ --enable-optimizations --enable-ipv6 --disable-shared
    
  8. Building Python takes some time. If you happen to have a multicore CPU, try to use e.g. four cores to speed things up:
    $ make -j4 LDFLAGS="-static" LINKFORSHARED=" "
    
  9. Now install Python to the directory which becomes the root of our image:
    $ make -j4 LDFLAGS="-static" LINKFORSHARED=" " DESTDIR=../../img install
    
  10. Verify that the Python executable is statically linked, i.e. is not dynamically linked:
    $ ldd ../../img/bin/python3.11
    	not a dynamic executable
    
  11. You will need Setuptools in order to build PyMySQL. Thus, extract the tarball and change to the newly created directory:
    $ cd ..
    $ tar xf setuptools-65.5.0.tar.gz
    $ cd setuptools-65.5.0
    
  12. We can already use our Python interpreter for building Setuptools. Please note that we do not have to install them:
    $ ../../img/bin/python3 setup.py build
    
  13. Extract, build, and install PyMySQL to the root of the image:
    $ cd ..
    $ tar xf PyMySQL-1.0.2.tar.gz
    $ cd PyMySQL-1.0.2
    $ PYTHONPATH=../setuptools-65.5.0 ../../img/bin/python3 setup.py build
    $ PYTHONPATH=../setuptools-65.5.0 ../../img/bin/python3 setup.py install --prefix ../../img
    
    The PYTHONPATH denotes additional directories where Python should look for modules, which a neat trick to avoid clobbering the image with Setuptools.
  14. Remove unnecessary files which should not be copied to the image:
    $ cd ../../img
    $ rm -rf bin/{2to3,idle3,pydoc3,python3-config,python3.11-config} include share lib/{libpython3.11.a,pkgconfig} lib/python3.11/{unittest,tkinter,test,lib2to3,idlelib,ensurepip,distutils,config-3.11-x86_64-linux-gnu}
    $ find . -type d -name __pycache__ -exec rm -rf "{}" \;
    
  15. Remove any symbols from the Python executable:
    $ strip bin/python3.11
    
  16. By default, our Python interpreter may be started inside the container as either /bin/python3.11 or /bin/python3, the latter one being a symlink. In addition, I like to start Python by either /bin/python, /usr/bin/python, /usr/bin/python3, or /usr/bin/python3.11. Thus, we need two more symlinks:
    $ ln -s python3.11 bin/python
    $ ln -s . usr
    
  17. For datetime.datetime.now(), email.utils.localtime(), and such to display the correct local date and time, copy /etc/localtime from the host system to the container image:
    $ mkdir etc
    $ cp /etc/localtime etc/
    
  18. Our Dockerfile is very simple. We will place it directly in the scratchpython directory:
    $ cd ..
    $ echo 'FROM scratch
    ADD img/ /
    USER 2342
    ENTRYPOINT ["/bin/python"]' > Dockerfile
    
    For security reasons, even inside a container, Python will be started as unprivileged user id 2342.
  19. Building the image is straightforward. As it does not rely on any pre-built image, we will name it scratchpython. Its version number follows Python:
    $ sudo docker build -t scratchpython:3.11.0 .
    
  20. Optionally, tag that version as latest:
    $ sudo docker tag scratchpython:3.11.0 scratchpython:latest
    
  21. Our image saves more than 50% disk space compared to Alpine with Python installed:
    $ sudo docker images
    REPOSITORY      TAG       IMAGE ID       CREATED        SIZE
    scratchpython   3.11.0    ef07ca829e6c   2 hours ago    25.8MB
    scratchpython   latest    ef07ca829e6c   2 hours ago    25.8MB
    alpinepython    1         011acaf505a2   4 hours ago    53.6MB
    alpine          latest    9c6f07244728   2 months ago   5.54MB
    
  22. But does it work?:
    $ sudo docker run --name test2 -ti scratchpython
    Python 3.11.0 (main, Oct 26 2022, 19:48:40) [GCC 11.3.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> 
    
  23. Let's poke around inside the container:
    >>> import os
    >>> os.uname()
    posix.uname_result(sysname='Linux', nodename='056489cb825f', release='5.15.0-52-generic', version='#58-Ubuntu SMP Thu Oct 13 08:03:55 UTC 2022', machine='x86_64')
    >>> os.listdir(".")
    ['bin', 'usr', 'lib', 'sys', 'dev', 'etc', '.dockerenv', 'proc']
    >>> os.getcwd()
    '/'
    >>> for dir in [dir for dir in os.listdir("/proc") if dir.isnumeric()]:
    ...     with open("/proc/%s/cmdline" % dir) as fh:
    ...         print("pid %s: %s" % (dir, " ".join(fh.readlines())))
    ... 
    pid 1: /bin/python
    >>> 
    >>> import socket
    >>> socket.if_nameindex()
    [(1, 'lo'), (20, 'eth0')]
    >>> with open("/proc/net/fib_trie") as fh:
    ...     for line in fh.readlines():
    ...         print(line, end="")
    ... 
    Main:
      +-- 0.0.0.0/0 3 0 5
         |-- 0.0.0.0
            /0 universe UNICAST
         +-- 127.0.0.0/8 2 0 2
            +-- 127.0.0.0/31 1 0 0
               |-- 127.0.0.0
                  /8 host LOCAL
               |-- 127.0.0.1
                  /32 host LOCAL
            |-- 127.255.255.255
               /32 link BROADCAST
         +-- 172.17.0.0/16 2 0 2
            +-- 172.17.0.0/30 2 0 2
               |-- 172.17.0.0
                  /16 link UNICAST
               |-- 172.17.0.2
                  /32 host LOCAL
            |-- 172.17.255.255
               /32 link BROADCAST
    Local:
      +-- 0.0.0.0/0 3 0 5
         |-- 0.0.0.0
            /0 universe UNICAST
         +-- 127.0.0.0/8 2 0 2
            +-- 127.0.0.0/31 1 0 0
               |-- 127.0.0.0
                  /8 host LOCAL
               |-- 127.0.0.1
                  /32 host LOCAL
            |-- 127.255.255.255
               /32 link BROADCAST
         +-- 172.17.0.0/16 2 0 2
            +-- 172.17.0.0/30 2 0 2
               |-- 172.17.0.0
                  /16 link UNICAST
               |-- 172.17.0.2
                  /32 host LOCAL
            |-- 172.17.255.255
               /32 link BROADCAST
    
  24. Exit the only process inside the container, and thus stop the container:
    >>> import sys
    >>> sys.exit("Good night")
    Good night
    
  25. Remove containers used for testing:
    $ sudo docker rm test test2
    test
    test2
    $