To solve Spark AttributeError: Can’t get attribute ‘new_block’ on error follow below methods.
File "/mnt/yarn/usercache/hadoop/appcache/application_1627867699893_0001/container_1627867699893_0001_01_000009/pyspark.zip/pyspark/broadcast.py", line 129, in load return pickle.load(file) AttributeError: Can't get attribute 'new_block' on <module 'pandas.core.internals.blocks' from '/mnt/miniconda/lib/python3.9/site-packages/pandas/core/internals/blocks.py'>
How to solve Spark AttributeError: Can’t get attribute ‘new_block’ on ?
Maintaining your current pandas version, decrease the pandas version on the dumping side to 1.2.x, and then dump a new pickle file with v1.2.x. Load it on your side with your pandas 1.2.x.
Your pandas version used to dump the pickle (dump version, most likely 1.3.x) is incompatible with the pandas version used to load the pickle (load version, most likely 1.2.x). To fix that, try upgrading the pandas version (load version) in the loading environment to 1.3.x and then loading the pickle. Alternatively, you can reduce the pandas version (dump version) to 1.2.x and then redump a new pickle. After that, you can use your pandas version 1.2.x to load the new pickle.
And none of this has anything to do with PySpark.
Hope the above solution works.
Also read :