Intel Parallel Studio Singularity container

miesch · March 19, 2020, 12:34am

Greetings - I would like to create a Singularity container that includes Intel Parallel Studio but I’m having problems. I can start from a Docker container that has intel compiler and intel mpi inside. I also included a hello_world_mpi application, compiled in the container. The Docker container works fine:

$ sudo docker run --rm -it jedi-intel19-impi-hello:latest
root@1dfdbccc1110:/# mpirun -np 4 hello_world_mpi
Hello from rank 1 of 4 running on 1dfdbccc1110. 
Hello from rank 2 of 4 running on 1dfdbccc1110
Hello from rank 0 of 4 running on 1dfdbccc1110
Hello from rank 3 of 4 running on 1dfdbccc1110

Then I build a singularity container from the docker container:

sudo singularity build intel19-impi-hello.sif docker-daemon:jedi-intel19-impi-hello:latest

When I run it I get this:

$ singularity shell -e intel19-impi-hello.sif
Singularity intel19-impi-hello.sif:~/jedi> source /etc/bash.bashrc
ubuntu@ip-172-31-87-130:~/jedi$ mpirun -np 4 hello_world_mpi
[mpiexec@ip-172-31-87-130] enqueue_control_fd 
(../../../../../src/pm/i_hydra/libhydra/bstrap/src/intel/i_hydra_bstrap.c:70): assert (!closed) failed
[mpiexec@ip-172-31-87-130] launch_bstrap_proxies 
(../../../../../src/pm/i_hydra/libhydra/bstrap/src/intel/i_hydra_bstrap.c:517): error enqueuing control fd
[mpiexec@ip-172-31-87-130] HYD_bstrap_setup 
(../../../../../src/pm/i_hydra/libhydra/bstrap/src/intel/i_hydra_bstrap.c:714): unable to launch bstrap proxy
[mpiexec@ip-172-31-87-130] main (../../../../../src/pm/i_hydra/mpiexec/mpiexec.c:1919): error setting 
up the boostrap proxies

I get the same thing if I try to create the Singularity container from a recipe file, as follows:

BootStrap: docker
From: ubuntu:18.04
%post
    . /.singularity.d/env/10-docker*.sh

%post
    apt-get update -y
    DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
        bc \
        bison \
        build-essential \
        csh \
        curl \
        dirmngr \
        file \
        flex \
        git \
        gnupg2 \
        ksh \
        less \
        libasound2 \
        libcurl4-openssl-dev \
        libexpat1-dev \
        libgtk2.0-common \
        libncurses-dev \
        libpango-1.0.0 \
        libssl-dev \
        libx11-dev \
        libxml2-dev \
        lsb-release \
        man-db \
        nano \
        openssh-server \
        screen \
        software-properties-common \
        swig \
        tcl \
        tcsh \
        tk \
        unzip \
        vim \
        wget \
        wish \
        xserver-xorg
    rm -rf /var/lib/apt/lists/*

# Intel Parallel Studio XE
%post
    apt-get update -y
    DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
        build-essential \
        cpio
    rm -rf /var/lib/apt/lists/*
%files
    intel_tarballs/parallel_studio_xe_2019_update5_cluster_edition.tgz /var/tmp/parallel_studio_xe_2019_update5_cluster_edition.tgz
%files
    intel_license/*******.lic /var/tmp/license.lic
%post
    cd /
    mkdir -p /var/tmp && tar -x -f /var/tmp/parallel_studio_xe_2019_update5_cluster_edition.tgz -C /var/tmp -z
    sed -i -e 's/^#\?\(COMPONENTS\)=.*/\1=DEFAULTS/g' \
        -e 's|^#\?\(PSET_INSTALL_DIR\)=.*|\1=/opt/intel|g' \
        -e 's/^#\?\(ACCEPT_EULA\)=.*/\1=accept/g' \
        -e 's/^#\?\(ACTIVATION_TYPE\)=.*/\1=license_file/g' \
        -e 's|^#\?\(ACTIVATION_LICENSE_FILE\)=.*|\1=/var/tmp/license.lic|g' /var/tmp/parallel_studio_xe_2019_update5_cluster_edition/silent.cfg
    cd /var/tmp/parallel_studio_xe_2019_update5_cluster_edition && ./install.sh --silent=silent.cfg
    rm -rf /var/tmp/parallel_studio_xe_2019_update5_cluster_edition.tgz /var/tmp/parallel_studio_xe_2019_update5_cluster_edition
%post
    cd /
    echo "source /opt/intel/compilers_and_libraries/linux/bin/compilervars.sh intel64" >> /etc/bash.bashrc

%runscript
    exec /bin/bash -l "$@"

Does anyone see anything wrong? Thanks.

vsoch · March 19, 2020, 12:50am

When you do anything with Singularity and MPI, because the environment is seamless from host to container, you need to have the exact same version of MPI installed on the host as you do in the container. This is different from docker that is completely isolated. See

https://sylabs.io/guides/3.5/user-guide/mpi.html

for the different approaches that generally work.

miesch · March 19, 2020, 3:04am

Thanks @vsoch for your response. I am aware of these strategies but I believe that is primarily concerned with running across multiple nodes. That is indeed what I tried initially - see my related post on the sylabs GitHub site. But then I realized that mpi inside the container did not even work, which strikes me as a bigger problem.

We regularly generate gnu/openmpi and clang/mpich singularity containers that users can run on a single node and that are entirely independent of any MPI on the host. In fact, they are often run on laptops or virtual machines that do not have any MPI implementation on the host. These containers work fine if you run mpirun inside the container as I demonstrated in my post. Furthermore, we have gnu/openmpi, clang/mpich, intel17, and intel19 Charliecloud containers for which this approach also works. The only place where that hello world program fails is in intel Singularity containers.

vsoch · March 19, 2020, 3:25am

Understood, thank you for that clarification. So if you aren’t concerned about scaling this, then what you should try first is to reproduce the docker run with singularity. To do that, you need to completely isolate the container from the host. For example, in addition to -e you should use --containall to prevent binds from the host:

singularity shell -e --containall intel19-impi-hello.sif

I’m not sure why you had the “mpirun” at the end of the shell command? I have the container recipe you provided on GitHub, but it’s not reproducible because I don’t have the files on my host. Are you able to provide these so that I can see see if I can reproduce tomorrow? Could you also share the Docker container somewhere for me to pull as well, and (even better) link to the Dockerfile so I can confirm they are “the same” ?

miesch · March 19, 2020, 6:25pm

Thanks again @vsoch for the suggestions. That mpirun at the end of the shell command was a typo. I tried your suggestion of --containall and a few other things (see the GitHub issue for details). But, I am still seeing the problem. I cannot share the files or the containers because they contain proprietary intel software. However, I can try to do a multi-stage build that leaves out the proprietary components. I’ll try to do that today.

vsoch · March 19, 2020, 6:41pm

okay great! It will be hugely easier to try and help if I can reproduce.

miesch · March 20, 2020, 12:53am

@vsoch - Well, it turns out that the effort to build the multi-stage container solved the problem! I did a multi-stage Docker build that only includes the intel mpi runtime libraries in the second stage. Then I created a singularity image from that. Here is the Dockerfile:

FROM ubuntu:18.04 AS devel

# Intel Parallel Studio XE
RUN apt-get update -y && \
    DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
        build-essential \
        cpio && \
    rm -rf /var/lib/apt/lists/*
COPY intel_tarballs/parallel_studio_xe_2019_update5_cluster_edition.tgz /var/tmp/parallel_studio_xe_2019_update5_cluster_edition.tgz
COPY intel_license/COM_L___LXMW-67CW6CHW.lic /var/tmp/license.lic
RUN mkdir -p /var/tmp && tar -x -f /var/tmp/parallel_studio_xe_2019_update5_cluster_edition.tgz -C /var/tmp -z && \
    sed -i -e 's/^#\?\(COMPONENTS\)=.*/\1=DEFAULTS/g' \
        -e 's|^#\?\(PSET_INSTALL_DIR\)=.*|\1=/opt/intel|g' \
        -e 's/^#\?\(ACCEPT_EULA\)=.*/\1=accept/g' \
        -e 's/^#\?\(ACTIVATION_TYPE\)=.*/\1=license_file/g' \
        -e 's|^#\?\(ACTIVATION_LICENSE_FILE\)=.*|\1=/var/tmp/license.lic|g' /var/tmp/parallel_studio_xe_2019_update5_cluster_edition/silent.cfg && \
    cd /var/tmp/parallel_studio_xe_2019_update5_cluster_edition && ./install.sh --silent=silent.cfg && \
    rm -rf /var/tmp/parallel_studio_xe_2019_update5_cluster_edition.tgz /var/tmp/parallel_studio_xe_2019_update5_cluster_edition
RUN echo "source /opt/intel/compilers_and_libraries/linux/bin/compilervars.sh intel64" >> /etc/bash.bashrc

COPY hello_world_mpi.c /root/jedi/hello_world_mpi.c

RUN export COMPILERVARS_ARCHITECTURE=intel64 && \
    . /opt/intel/compilers_and_libraries/linux/bin/compilervars.sh && \
    cd /root/jedi && \
    mpiicc hello_world_mpi.c -o /usr/local/bin/hello_world_mpi -lstdc++

FROM ubuntu:18.04

# Intel MPI version 2019.6-088
RUN apt-get update -y && \
    DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
        apt-transport-https \
        ca-certificates \
        gnupg \
        man-db \
        openssh-client \
        wget && \
    rm -rf /var/lib/apt/lists/*
RUN wget -qO - https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB | apt-key add - && \
    echo "deb https://apt.repos.intel.com/mpi all main" >> /etc/apt/sources.list.d/hpccm.list && \
    apt-get update -y && \
    DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
        intel-mpi-2019.6-088 && \
    rm -rf /var/lib/apt/lists/*
RUN echo "source /opt/intel/compilers_and_libraries/linux/mpi/intel64/bin/mpivars.sh intel64" >> /etc/bash.bashrc

COPY --from=devel /usr/local/bin/hello_world_mpi /usr/local/bin/hello_world_mpi

Thanks for your help

vsoch · March 20, 2020, 1:25am

Woot! It’s so great when it works out like that If you ever run into similar issues again and want another party to test with, ping me directly on github also @vsoch and provide the Singularity recipe and I can at least try to reproduce your error and play around. Happy mpi-ing!