2 on your databricks cluster, install following 2 libraries: Web i need to read that file into a pyspark dataframe. Indeed, this should be a better practice than involving pandas since then the benefit of spark would not exist anymore. Support both xls and xlsx file extensions from a local filesystem or url. No such file or directory. From pyspark.sql import sparksession import pandas spark = sparksession.builder.appname(test).getorcreate() pdf = pandas.read_excel('excelfile.xlsx', sheet_name='sheetname', inferschema='true') df =. Xlrd then, you will be able to read your excel as follows: Srcparquetdf = spark.read.parquet (srcpathforparquet ) reading excel file from the path throw error: That would look like this: Web you can read it from excel directly.
Web you can use pandas to read.xlsx file and then convert that to spark dataframe. You can run the same code sample as defined above, but just adding the class needed to the configuration of your sparksession. Parameters iostr, file descriptor, pathlib.path, excelfile or xlrd.book the string could be a url. 2 on your databricks cluster, install following 2 libraries: Web reading parquet file from the path works fine. Support an option to read a single sheet or a list of sheets. Web you can read it from excel directly. Support an option to read a single sheet or a list of sheets. #flags required for reading the excel isheaderon = “true” isinferschemaon = “false”. Import pyspark.pandas as ps spark_df = ps.read_excel ('', sheet_name='sheet1', inferschema='').to_spark () share. Support both xls and xlsx file extensions from a local filesystem or url.