In different environments we as IT professionals have to approach matters in various ways. Data comes in many forms, shapes, and sizes. One of the issues I recently had was how to insert data from a .xlsx file into a MySQL database with Python.
I will dive into it and share the code, as well as dissect it line by line.
Some of the things mentioned in this post I learned from the Zerto To Mastery academy here. If You decide to sign-up, I will earn a commission.
The Problem
A person or a company has a large .xlsx file and it’s constantly being updated.
There’s yet no application with a proper database in the company for this specific task. But the data is crucially important and it’s there to be analyzed or used.
This means we have to find a solution. And we better do it quickly!
We already have a bigger database, or maybe even a cluster where most of the data is located. And when all the data is safely and securely in one place – it would be structurally better. As well as it would make it easier to import it to BI tools like Power BI.
Or do we need to insert the data from the .xlsx into an application that will use the data? And that application already has a database.
Or even something else entirely! The main point here is – we need to take data from a .xlsx file and enter it into a MySQL database.
The solution – Python code
Here’s the full script:
import os
import pandas as pd
import mysql.connector
# Get the absolute path to your file, best if the file is in the same folder as the script
dir_path = os.path.dirname(os.path.realpath(__file__))
file_path = os.path.join(dir_path, 'yourfilename.xlsx')
# Load the Excel file into a pandas DataFrame
df = pd.read_excel(file_path)
# Connect to the MySQL database
cnx = mysql.connector.connect(user='youruserhere', password='yourpasswordhere',
host='yourserverhere', database='yourdatabasehere')
cursor = cnx.cursor()
# Insert the data into the MySQL database
for row in df.itertuples():
cursor.execute('INSERT INTO yourtablehere (workerid, id, p_nr, p_name, start_time, finish_time, total, name) VALUES (%s, %s, %s, %s, %s, %s, %s, %s)',
(row.workerid, row.id, row.p_nr, row.p_name, row.start_time, row.finish_time,
row.total, row.name))
# Commit the changes and close the connection
cnx.commit()
cursor.close()
cnx.close()
The requirements before proceeding
You need to have Python installed on Your machine. I had version 3.11.2 but this can change dramatically when You are reading it in the future.
Usually – the latest version will work.
You also need the pandas module. You can install with
pip install pandas
from the command line. For example in Visual Studio Code.
And we also need the mysql.connector. Again, You can use
pip install mysql-connector-python
That’s it. We are now ready to import data from .xlsx to a MySQL database with Python!
The Python code explained to import data from .xlsx to a MySQL database
First, the code imports the necessary modules. The os module provides a way to interact with the file system, while pandas and mysql.connector are used to read the Excel file and connect to the MySQL database, respectively.
import os
import pandas as pd
import mysql.connector
Next, the code gets the absolute path to the Excel file using the os module. It first gets the directory path of the current file using os.path.dirname() and then joins it with the filename ‘yourfilenamehere.xlsx’ using os.path.join(). The resulting file_path variable contains the absolute path to the Excel file.
dir_path = os.path.dirname(os.path.realpath(__file__))
file_path = os.path.join(dir_path, 'yourfilenamehere.xlsx')
Reading the file and establishing a connection to the database itself
After getting the file path, the code reads the Excel file into a pandas DataFrame using pd.read_excel(). This creates a DataFrame object that can be manipulated in memory.
df = pd.read_excel(file_path)
After reading the Excel file, the code establishes a connection to the MySQL database using mysql.connector.connect(). The user, password, host, and database parameters are provided to establish the connection. Once the connection is established, a cursor object is created to execute SQL queries.
cnx = mysql.connector.connect(user='youruserhere', password='yourpasswordhere', host='yourserverhere', database='databasenamehere')
cursor = cnx.cursor()
Working with the data
The code then iterates through each row in the DataFrame using a for loop and executes an SQL INSERT statement for each row using cursor.execute(). The INSERT statement specifies the table and column names, followed by the values to be inserted. The values to be inserted are passed in as a tuple containing the row values.
for row in df.itertuples():
cursor.execute('INSERT INTO yourtablenamehere (workerid, id, p_nr, p_name, start_time, finish_time, total, name) VALUES (%s, %s, %s, %s, %s, %s, %s, %s)',
(row.workerid, row.id, row.p_nr, row.p_name, row.start_time, row.finish_time,
row.total, row.name))
Finally, the code commits the changes to the MySQL database using cnx.commit() and closes the cursor and database connection using cursor.close() and cnx.close(), respectively.
cnx.commit()
cursor.close()
cnx.close()
Ian Altosaar
Some of the things mentioned in this post I learned from the Zerto To Mastery academy here. If You decide to sign-up, I will earn a commission.