Home » ImportError: No module named py4j.java_gateway

ImportError: No module named py4j.java_gateway

To solve this ImportError: No module named py4j.java_gateway error.

To begin, define the py4j module. Spark was originally created in Scala, however due to industry adoption, its API PySpark was provided for Python using Py4J.

Py4J is a required module for running the PySpark application and may be found in the $SPARK HOME/python/lib/py4j-*-src.zip directory.

To execute the PySpark application after installing Spark, add the Py4j module to the PYTHONPATH environment variable. ImportError: No module called py4j.java gateway occurs if this module is not set to env.

So try the below code and make it run:

export SPARK_HOME=/Users/your_name/apps/spark-3.0.0-bin-hadoop2.7
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/build:$SPARK_HOME/python/lib/py4j-0.10.9-src.zip:$PYTHONPATH

Put these in .bashrc file and reload it with source /.bashrc.

The py4j module version changes depending on the PySpark version you are using; in order to set this version correctly, follow the code below. In order to know the path of the pyspark use pip show pyspark.

export PYTHONPATH=${SPARK_HOME}/python/:$(echo ${SPARK_HOME}/python/lib/py4j-*-src.zip):${PYTHONPATH}

If you are using windows then try:

set SPARK_HOME=C:\apps\opt\spark-3.0.0-bin-hadoop2.7

the set path using:

set PYTHONPATH=%SPARK_HOME%/python;%SPARK_HOME%/python/lib/py4j-0.10.9-src.zip;%PYTHONPATH%

Hope this solves your issue.

Look at similar issue:
ModuleNotFoundError: No module named ‘py4j’