Python自动化-- openpyxl Python automation -- openpyxl

Table of Contents

目录

Before You Begin 开始之前
Reading Excel Spreadsheets With openpyxl 使用 openpyxl 阅读 Excel 电子表格
Writing Excel Spreadsheets With openpyxl 用 openpyxl 编写 Excel 电子表格
Conclusion 总结

Remove ads

Watch Now This tutorial has a related video course created by the Real Python team. Watch it together with the written tutorial to deepen your understanding: Editing Excel Spreadsheets in Python With openpyxl

本教程有一个由 Real Python 团队创建的相关视频课程。与编写的教程一起观看: 使用 openpyxl 在 Python 中编辑 Excel 电子表格

Excel spreadsheets are one of those things you might have to deal with at some point. Either it’s because your boss loves them or because marketing needs them, you might have to learn how to work with spreadsheets, and that’s when knowing openpyxl comes in handy!

Excel 电子表格是你在某个时候可能不得不处理的事情之一。要么是因为你的老板喜欢它们，要么是因为市场营销需要它们，你可能需要学习如何使用电子表格，这就是知道 openpyxl 派上用场的时候了！

Spreadsheets are a very intuitive and user-friendly way to manipulate large datasets without any prior technical background. That’s why they’re still so commonly used today.

电子表格是一种非常直观和用户友好的方式来操作大型数据集没有任何事先的技术背景。这就是为什么它们今天仍然被广泛使用的原因。

In this article, you’ll learn how to use openpyxl to:

在本文中，您将学习如何使用 openpyxl:

Manipulate Excel spreadsheets with confidence 自信地操作 Excel 电子表格
Extract information from spreadsheets 从电子表格中提取信息
Create simple or more complex spreadsheets, including adding styles, charts, and so on

创建简单或更复杂的电子表格，包括添加样式、图表等

This article is written for intermediate developers who have a pretty good knowledge of Python data structures, such as dicts and lists, but also feel comfortable around OOP and more intermediate level topics.

本文是为那些对 Python 数据结构(比如 dicts 和 lists)有相当丰富知识的中级开发人员编写的，他们也熟悉 OOP 和更多中级主题。

Download Dataset: Click here to download the dataset for the openpyxl exercise you’ll be following in this tutorial.

下载数据集: 单击此处下载本教程中将要进行的 openpyxl 练习的数据集。

Before You Begin^[1]开始之前[1]

If you ever get asked to extract some data from a database or log file into an Excel spreadsheet, or if you often have to convert an Excel spreadsheet into some more usable programmatic form, then this tutorial is perfect for you. Let’s jump into the openpyxl caravan!

如果你曾经被要求从数据库或日志文件中提取一些数据到 Excel 电子表格中，或者如果你经常需要将 Excel 电子表格转换成一些更加可用的编程形式，那么本教程非常适合你。让我们跳进敞篷大篷车！

Remove ads

删除广告

Practical Use Cases^[2]实际用例[2]

First things first, when would you need to use a package like openpyxl in a real-world scenario? You’ll see a few examples below, but really, there are hundreds of possible scenarios where this knowledge could come in handy.

首先，什么时候需要在现实场景中使用 openpyxl 这样的包？您将在下面看到一些例子，但实际上，有数百种可能的场景可以让这些知识派上用场。

Importing New Products Into a Database^[3]将新产品导入数据库[3]

You are responsible for tech in an online store company, and your boss doesn’t want to pay for a cool and expensive CMS system.

你负责在线商店公司的技术，而你的老板不想为一个酷而昂贵的 CMS 系统付费。

Every time they want to add new products to the online store, they come to you with an Excel spreadsheet with a few hundred rows and, for each of them, you have the product name, description, price, and so forth.

每次他们想要向在线商店添加新产品时，他们都会给你一个有几百行的 Excel 电子表格，对于每一行，你都有产品名称、描述、价格等等。

Now, to import the data, you’ll have to iterate over each spreadsheet row and add each product to the online store.

现在，要导入数据，您必须迭代每个电子表格行，并将每个产品添加到在线商店。

Exporting Database Data Into a Spreadsheet^[4]将数据库数据导出到电子表格[4]中

Say you have a Database table where you record all your users’ information, including name, phone number, email address, and so forth.

假设您有一个数据库表，用于记录所有用户的信息，包括姓名、电话号码、电子邮件地址等等。

Now, the Marketing team wants to contact all users to give them some discounted offer or promotion. However, they don’t have access to the Database, or they don’t know how to use SQL to extract that information easily.

现在，市场营销团队希望联系所有用户，给他们一些折扣或促销。但是，他们没有访问数据库的权限，或者他们不知道如何使用 SQL 轻松地提取这些信息。

What can you do to help? Well, you can make a quick script using openpyxl that iterates over every single User record and puts all the essential information into an Excel spreadsheet.

你能帮上什么忙？您可以使用 openpyxl 创建一个快速脚本，该脚本遍历每个用户记录，并将所有重要信息放入 Excel 电子表格中。

That’s gonna earn you an extra slice of cake at your company’s next birthday party!

这会让你在公司下次生日派对上多得一块蛋糕！

Appending Information to an Existing Spreadsheet^[5]将信息附加到现有的电子表格[5]

You may also have to open a spreadsheet, read the information in it and, according to some business logic, append more data to it.

您可能还需要打开一个电子表格，读取其中的信息，并根据一些业务逻辑，在其中附加更多的数据。

For example, using the online store scenario again, say you get an Excel spreadsheet with a list of users and you need to append to each row the total amount they’ve spent in your store.

例如，再次使用在线商店场景，假设您得到一个包含用户列表的 Excel 电子表格，并且您需要在每一行中附加他们在您的商店中花费的总金额。

This data is in the Database and, in order to do this, you have to read the spreadsheet, iterate through each row, fetch the total amount spent from the Database and then write back to the spreadsheet.

这些数据存在于数据库中，为此，您必须读取电子表格，遍历每一行，从数据库中获取总支出，然后写回电子表格。

Not a problem for openpyxl!

对 openpyxl 来说不是问题！

Learning Some Basic Excel Terminology^[6]学习一些基本的 Excel 术语[6]

Here’s a quick list of basic terms you’ll see when you’re working with Excel spreadsheets:

下面是一些你在使用 Excel 电子表格时会看到的基本术语:

Term 术语	Explanation 解释
Spreadsheet or Workbook 电子表格或工作簿	A Spreadsheet is the main file you are creating or working with.电子表格是你正在创建或使用的主要文件。
Worksheet or Sheet 工作表或工作表	A Sheet is used to split different kinds of content within the same spreadsheet. A Spreadsheet can have one or more Sheets.一个 * * Sheet * * 用于在同一个电子表格中拆分不同类型的内容。一个电子表格可以有一个或多个表格。
Column 专栏	A Column is a vertical line, and it’s represented by an uppercase letter: _A_.列 * * 列 * * 是一条垂直线，用一个大写字母表示: a。
Row 划船	A Row is a horizontal line, and it’s represented by a number: _1_.是一条水平线，由一个数字表示: 1。
Cell 细胞	A Cell is a combination of Column and Row, represented by both an uppercase letter and a number: _A1_.单元格 * * * 是 * * Column * * * 和 * * Row * * * 的组合，由大写字母和数字 a 1表示。

Getting Started With openpyxl^[7]开始使用 openpyxl [7]

Now that you’re aware of the benefits of a tool like openpyxl, let’s get down to it and start by installing the package. For this tutorial, you should use Python 3.7 and openpyxl 2.6.2. To install the package, you can do the following:

现在您已经了解了 openpyxl 这样的工具的好处，让我们开始着手安装这个包。对于本教程，您应该使用 Python 3.7和 openpyxl 2.6.2。要安装这个软件包，你可以执行以下操作:

`$ pip install openpyxl`

After you install the package, you should be able to create a super simple spreadsheet with the following code:

安装完这个软件包后，你应该可以用下面的代码创建一个超级简单的电子表格:

`from openpyxl import Workbook

workbook = Workbook()
sheet = workbook.active

sheet["A1"] = "hello"
sheet["B1"] = "world!"

workbook.save(filename="hello_world.xlsx")`

The code above should create a file called hello_world.xlsx in the folder you are using to run the code. If you open that file with Excel you should see something like this:

上面的代码应该创建一个名为 helloworld 的文件。在运行代码的文件夹中的 xlsx。如果你用 Excel 打开这个文件，你会看到这样的东西:

Woohoo, your first spreadsheet created!

哇，你的第一个电子表格创建了！

Remove ads

删除广告

Reading Excel Spreadsheets With openpyxl^[8]用 openpyxl [8]阅读 Excel 电子表格

Let’s start with the most essential thing one can do with a spreadsheet: read it.

让我们从电子表格中最重要的事情开始: 阅读它。

You’ll go from a straightforward approach to reading a spreadsheet to more complex examples where you read the data and convert it into more useful Python structures.

您将从直接阅读电子表格到阅读更复杂的示例，在这些示例中，您将读取数据并将其转换为更有用的 Python 结构。

Dataset for This Tutorial^[9]本教程的数据集[9]

Before you dive deep into some code examples, you should download this sample dataset and store it somewhere as sample.xlsx:

在深入研究一些代码示例之前，您应该下载这个示例数据集并将其存储为 sample.xlsx:

Download Dataset: Click here to download the dataset for the openpyxl exercise you’ll be following in this tutorial.

下载数据集: 单击此处下载本教程中将要进行的 openpyxl 练习的数据集。

This is one of the datasets you’ll be using throughout this tutorial, and it’s a spreadsheet with a sample of real data from Amazon’s online product reviews. This dataset is only a tiny fraction of what Amazon provides, but for testing purposes, it’s more than enough.

这是你将在本教程中使用的数据集之一，这是一个电子表格，其中包含来自亚马逊在线产品评论的真实数据样本。这个数据集只是亚马逊提供的一小部分，但是为了测试目的，它已经足够了。

A Simple Approach to Reading an Excel Spreadsheet^[10]阅读 Excel 电子表格的一个简单方法[10]

Finally, let’s start reading some spreadsheets! To begin with, open our sample spreadsheet:

最后，让我们开始阅读一些电子表格! 首先，打开示例电子表格:

`>>> from openpyxl import load_workbook
>>> workbook = load_workbook(filename="sample.xlsx")
>>> workbook.sheetnames
['Sheet 1']

>>> sheet = workbook.active
>>> sheet

  

>>> sheet.title
'Sheet 1'`

In the code above, you first open the spreadsheet sample.xlsx using load_workbook(), and then you can use workbook.sheetnames to see all the sheets you have available to work with. After that, workbook.active selects the first available sheet and, in this case, you can see that it selects Sheet 1 automatically. Using these methods is the default way of opening a spreadsheet, and you’ll see it many times during this tutorial.

在上面的代码中，首先使用 load _ workbook ()打开电子表格 sample.xlsx，然后可以使用 workbook.sheetnames 查看可用的所有工作表。之后，workbook.active 选择第一个可用的工作表，在本例中，您可以看到它自动选择了工作表1。使用这些方法是打开电子表格的默认方式，在本教程中您将多次看到它。

Now, after opening a spreadsheet, you can easily retrieve data from it like this:

现在，打开电子表格后，你可以像这样轻松地从中检索数据:

`>>> sheet["A1"]

  

>>> sheet["A1"].value
'marketplace'

>>> sheet["F10"].value
"G-Shock Men's Grey Sport Watch"`

To return the actual value of a cell, you need to do .value. Otherwise, you’ll get the main Cell object. You can also use the method .cell() to retrieve a cell using index notation. Remember to add .value to get the actual value and not a Cell object:

要返回单元格的实际值，需要执行。价值。否则，您将得到主 Cell 对象。你也可以使用这个方法。Cell ()使用索引表示法检索单元格。记得添加。值来获取实际值，而不是 Cell 对象:

`>>> sheet.cell(row=10, column=6)

  

>>> sheet.cell(row=10, column=6).value
"G-Shock Men's Grey Sport Watch"`

You can see that the results returned are the same, no matter which way you decide to go with. However, in this tutorial, you’ll be mostly using the first approach: ["A1"].

你可以看到返回的结果是一样的，不管你决定采用哪种方式。但是，在本教程中，您将主要使用第一种方法: [“ A1”]。

Note: Even though in Python you’re used to a zero-indexed notation, with spreadsheets you’ll always use a one-indexed notation where the first row or column always has index 1.

注意: 尽管在 Python 中您习惯了零索引表示法，但是使用电子表格时，您将始终使用一个索引表示法，其中第一行或第一列总是有索引1。

The above shows you the quickest way to open a spreadsheet. However, you can pass additional parameters to change the way a spreadsheet is loaded.

上面给出了打开电子表格的最快方法。但是，您可以传递其他参数来更改电子表格加载的方式。

Additional Reading Options^[11]附加阅读选项[11]

There are a few arguments you can pass to load_workbook() that change the way a spreadsheet is loaded. The most important ones are the following two Booleans:

可以通过一些参数来加载 _ workbook () ，这些参数可以改变加载电子表格的方式。最重要的是以下两个布尔值:

read_only loads a spreadsheet in read-only mode allowing you to open very large Excel files.

只读加载只读模式的电子表格，允许您打开非常大的 Excel 文件。
data_only ignores loading formulas and instead loads only the resulting values.

只忽略加载公式，而只加载结果值。

Remove ads

删除广告

Importing Data From a Spreadsheet^[12]从电子表格导入数据[12]

Now that you’ve learned the basics about loading a spreadsheet, it’s about time you get to the fun part: the iteration and actual usage of the values within the spreadsheet.

现在您已经了解了加载电子表格的基本知识，接下来您将进入有趣的部分: 电子表格中值的迭代和实际使用。

This section is where you’ll learn all the different ways you can iterate through the data, but also how to convert that data into something usable and, more importantly, how to do it in a Pythonic way.

在本节中，您将学习遍历数据的所有不同方法，以及如何将数据转换为可用的内容，更重要的是，如何以 python 的方式实现。

Iterating Through the Data^[13]遍历数据[13]

There are a few different ways you can iterate through the data depending on your needs.

有几种不同的方法可以根据需要遍历数据。

You can slice the data with a combination of columns and rows:

可以使用列和行的组合对数据进行切片:

`>>> sheet["A1:C2"]
((
  , 
  
   , 
   
    ),
 (
    
     , 
     
      , 
      
       ))`

You can get ranges of rows or columns:

你可以得到行或列的范围:

`>>> # Get all cells from column A
>>> sheet["A"]
(
  ,
 
  
   ,
 ...
 
   
    ,
 
    
     )

>>> # Get all cells for a range of columns
>>> sheet["A:B"]
((
     
      ,
 
      
       , ... 
       
        , 
        
         ), (
         
          , 
          
           , ... 
           
            , 
            
             )) >>> # Get all cells from row 5 >>> sheet[5] (
             
              , 
              
               , ... 
               
                , 
                
                 ) >>> # Get all cells for a range of rows >>> sheet[5:6] ((
                 
                  , 
                  
                   , ... 
                   
                    , 
                    
                     ), (
                     
                      , 
                      
                       , ... 
                       
                        , 
                        
                         ))`

You’ll notice that all of the above examples return a tuple. If you want to refresh your memory on how to handle tuples in Python, check out the article on Lists and Tuples in Python.

您将注意到，上面所有的示例都返回一个元组。如果您想更新关于如何在 Python 中处理元组的内存，请查看关于 Lists 和 Python 中的 Tuples 的文章。

There are also multiple ways of using normal Python generators to go through the data. The main methods you can use to achieve this are:

还有多种使用普通 Python 生成器处理数据的方法。你可以使用的主要方法是:

.iter_rows()
.iter_cols()

Both methods can receive the following arguments:

这两个方法都可以接收以下参数:

min_row
max_row
min_col
max_col

These arguments are used to set boundaries for the iteration:

这些参数用于设置迭代的边界:

`>>> for row in sheet.iter_rows(min_row=1,
...                            max_row=2,
...                            min_col=1,
...                            max_col=3):
...     print(row)
(
  , 
  
   , 
   
    )
(
    
     , 
     
      , 
      
       ) >>> for column in sheet.iter_cols(min_row=1, ... max_row=2, ... min_col=1, ... max_col=3): ... print(column) (
       
        , 
        
         ) (
         
          , 
          
           ) (
           
            , 
            
             )`

You’ll notice that in the first example, when iterating through the rows using .iter_rows(), you get one tuple element per row selected. While when using .iter_cols() and iterating through columns, you’ll get one tuple per column instead.

您会注意到，在第一个示例中，当使用。Iter _ rows () ，可以为每行选择一个元组元素。使用时。Iter _ cols ()并在列中迭代，您将得到每列一个元组。

One additional argument you can pass to both methods is the Boolean values_only. When it’s set to True, the values of the cell are returned, instead of the Cell object:

可以传递给这两个方法的另一个参数是 Boolean 值 _ only。当它设置为 True 时，将返回单元格的值，而不是 Cell 对象:

`>>> for value in sheet.iter_rows(min_row=1,
...                              max_row=2,
...                              min_col=1,
...                              max_col=3,
...                              values_only=True):
...     print(value)
('marketplace', 'customer_id', 'review_id')
('US', 3653882, 'R3O9SGZBVQBV76')`

If you want to iterate through the whole dataset, then you can also use the attributes .rows or .columns directly, which are shortcuts to using .iter_rows() and .iter_cols() without any arguments:

如果您想遍历整个数据集，那么您也可以使用属性。行或。列，它们是使用。Iter _ rows ()和。Iter _ cols ()不带任何参数:

`>>> for row in sheet.rows:
...     print(row)
(
  , 
  
   , 
   
    
...

    
     , 
     
      , 
      
       )`

These shortcuts are very useful when you’re iterating through the whole dataset.

在遍历整个数据集时，这些快捷方式非常有用。

Manipulate Data Using Python’s Default Data Structures^[14]使用 Python 的默认数据结构操作数据[14]

Now that you know the basics of iterating through the data in a workbook, let’s look at smart ways of converting that data into Python structures.

现在您已经了解了遍历工作簿中的数据的基本知识，接下来让我们看看将数据转换为 Python 结构的聪明方法。

As you saw earlier, the result from all iterations comes in the form of tuples. However, since a tuple is nothing more than an immutable list, you can easily access its data and transform it into other structures.

正如您在前面看到的，所有迭代的结果都以元组的形式出现。但是，由于元组只不过是一个不可变的列表，因此您可以轻松地访问它的数据并将其转换为其他结构。

For example, say you want to extract product information from the sample.xlsx spreadsheet and into a dictionary where each key is a product ID.

例如，假设您希望从 sample.xlsx 电子表格中提取产品信息，并将其放到一个字典中，其中每个键都是一个产品 ID。

A straightforward way to do this is to iterate over all the rows, pick the columns you know are related to product information, and then store that in a dictionary. Let’s code this out!

一个简单的方法是遍历所有行，选择与产品信息相关的列，然后将其存储在字典中。让我们把这个编码出来！

First of all, have a look at the headers and see what information you care most about:

首先，看看标题，看看你最关心的信息是什么:

`>>> for value in sheet.iter_rows(min_row=1,
...                              max_row=1,
...                              values_only=True):
...     print(value)
('marketplace', 'customer_id', 'review_id', 'product_id', ...)`

This code returns a list of all the column names you have in the spreadsheet. To start, grab the columns with names:

此代码返回电子表格中所有列名的列表。首先，抓取带有名字的列:

product_id
product_parent
product_title
product_category

Lucky for you, the columns you need are all next to each other so you can use the min_column and max_column to easily get the data you want:

幸运的是，你需要的列都是挨着的，所以你可以使用最小列和最大列轻松地获得你想要的数据:

`>>> for value in sheet.iter_rows(min_row=2,
...                              min_col=4,
...                              max_col=7,
...                              values_only=True):
...     print(value)
('B00FALQ1ZC', 937001370, 'Invicta Women\'s 15150 "Angel" 18k Yellow...)
('B00D3RGO20', 484010722, "Kenneth Cole New York Women's KC4944...)
...`

Nice! Now that you know how to get all the important product information you need, let’s put that data into a dictionary:

太棒了！现在你已经知道如何获得所有你需要的重要产品信息，让我们把这些数据放到字典里:

`import json
from openpyxl import load_workbook

workbook = load_workbook(filename="sample.xlsx")
sheet = workbook.active

products = {}

# Using the values_only because you want to return the cells' values
for row in sheet.iter_rows(min_row=2,
                           min_col=4,
                           max_col=7,
                           values_only=True):
    product_id = row[0]
    product = {
        "parent": row[1],
        "title": row[2],
        "category": row[3]
    }
    products[product_id] = product

# Using json here to be able to format the output for displaying later
print(json.dumps(products))`

The code above returns a JSON similar to this:

上面的代码返回一个类似于这样的 JSON:

`{
  "B00FALQ1ZC": {
    "parent": 937001370,
    "title": "Invicta Women's 15150 ...",
    "category": "Watches"
  },
  "B00D3RGO20": {
    "parent": 484010722,
    "title": "Kenneth Cole New York ...",
    "category": "Watches"
  }
}`

Here you can see that the output is trimmed to 2 products only, but if you run the script as it is, then you should get 98 products.

在这里，您可以看到输出被削减到只有2个产品，但是如果您按原样运行脚本，那么您应该得到98个产品。

Convert Data Into Python Classes^[15]将数据转换为 Python 类[15]

To finalize the reading section of this tutorial, let’s dive into Python classes and see how you could improve on the example above and better structure the data.

为了完成本教程的阅读部分，让我们深入了解 Python 类，看看如何改进上面的示例并更好地构造数据。

For this, you’ll be using the new Python Data Classes that are available from Python 3.7. If you’re using an older version of Python, then you can use the default Classes instead.

为此，您将使用 Python 3.7中提供的新 Python Data Classes。如果您使用的是较旧版本的 Python，那么您可以使用默认的类。

So, first things first, let’s look at the data you have and decide what you want to store and how you want to store it.

因此，首先，让我们看看您拥有的数据，并决定您想要存储什么以及如何存储它。

As you saw right at the start, this data comes from Amazon, and it’s a list of product reviews. You can check the list of all the columns and their meaning on Amazon.

正如您刚开始看到的，这些数据来自亚马逊，这是一个产品评论列表。你可以在亚马逊上查看所有列的列表及其意义。

There are two significant elements you can extract from the data available:

你可以从这些数据中提取出两个重要的元素:

Products 产品
Reviews 评论

A Product has:

A 产品有:

ID 身份证
Title 标题
Parent 父母
Category 类别

The Review has a few more fields:

《评论》还有几个领域:

ID 身份证
Customer ID 客户身份证
Stars 星星
Headline 标题
Body 身体
Date 日期

You can ignore a few of the review fields to make things a bit simpler.

您可以忽略一些复习字段，使事情变得简单一些。

So, a straightforward implementation of these two classes could be written in a separate file classes.py:

因此，可以在一个单独的文件 classes.py 中编写这两个类的直接实现:

`import datetime
from dataclasses import dataclass

@dataclass
class Product:
    id: str
    parent: str
    title: str
    category: str

@dataclass
class Review:
    id: str
    customer_id: str
    stars: int
    headline: str
    body: str
    date: datetime.datetime`

After defining your data classes, you need to convert the data from the spreadsheet into these new structures.

在定义了数据类之后，您需要将电子表格中的数据转换为这些新结构。

Before doing the conversion, it’s worth looking at our header again and creating a mapping between columns and the fields you need:

在进行转换之前，有必要再次查看我们的标题，并在列和您需要的字段之间创建映射:

`>>> for value in sheet.iter_rows(min_row=1,
...                              max_row=1,
...                              values_only=True):
...     print(value)
('marketplace', 'customer_id', 'review_id', 'product_id', ...)

>>> # Or an alternative
>>> for cell in sheet[1]:
...     print(cell.value)
marketplace
customer_id
review_id
product_id
product_parent
...`

Let’s create a file mapping.py where you have a list of all the field names and their column location (zero-indexed) on the spreadsheet:

让我们创建一个文件 mapping.py，其中有一个电子表格中所有字段名及其列位置(零索引)的列表:

`# Product fields
PRODUCT_ID = 3
PRODUCT_PARENT = 4
PRODUCT_TITLE = 5
PRODUCT_CATEGORY = 6

# Review fields
REVIEW_ID = 2
REVIEW_CUSTOMER = 1
REVIEW_STARS = 7
REVIEW_HEADLINE = 12
REVIEW_BODY = 13
REVIEW_DATE = 14`

You don’t necessarily have to do the mapping above. It’s more for readability when parsing the row data, so you don’t end up with a lot of magic numbers lying around.

你不一定要做上面的映射。在解析行数据时，它更注重可读性，因此您不会最终得到大量奇妙的数字。

Finally, let’s look at the code needed to parse the spreadsheet data into a list of product and review objects:

最后，让我们看一下将电子表格数据解析为产品列表和评审对象所需的代码:

`from datetime import datetime
from openpyxl import load_workbook
from classes import Product, Review
from mapping import PRODUCT_ID, PRODUCT_PARENT, PRODUCT_TITLE, \
    PRODUCT_CATEGORY, REVIEW_DATE, REVIEW_ID, REVIEW_CUSTOMER, \
    REVIEW_STARS, REVIEW_HEADLINE, REVIEW_BODY

# Using the read_only method since you're not gonna be editing the spreadsheet
workbook = load_workbook(filename="sample.xlsx", read_only=True)
sheet = workbook.active

products = []
reviews = []

# Using the values_only because you just want to return the cell value
for row in sheet.iter_rows(min_row=2, values_only=True):
    product = Product(id=row[PRODUCT_ID],
                      parent=row[PRODUCT_PARENT],
                      title=row[PRODUCT_TITLE],
                      category=row[PRODUCT_CATEGORY])
    products.append(product)

    # You need to parse the date from the spreadsheet into a datetime format
    spread_date = row[REVIEW_DATE]
    parsed_date = datetime.strptime(spread_date, "%Y-%m-%d")

    review = Review(id=row[REVIEW_ID],
                    customer_id=row[REVIEW_CUSTOMER],
                    stars=row[REVIEW_STARS],
                    headline=row[REVIEW_HEADLINE],
                    body=row[REVIEW_BODY],
                    date=parsed_date)
    reviews.append(review)

print(products[0])
print(reviews[0])`

After you run the code above, you should get some output like this:

在你运行上面的代码之后，你应该会得到这样的输出:

`Product(id='B00FALQ1ZC', parent=937001370, ...)
Review(id='R3O9SGZBVQBV76', customer_id=3653882, ...)`

That’s it! Now you should have the data in a very simple and digestible class format, and you can start thinking of storing this in a Database or any other type of data storage you like.

就是这样！现在，您应该具有非常简单和易于理解的类格式的数据，并且可以开始考虑将其存储在数据库或任何其他类型的数据存储中。

Using this kind of OOP strategy to parse spreadsheets makes handling the data much simpler later on.

使用这种 OOP 策略解析电子表格可以使后面的数据处理变得更加简单。

Remove ads

删除广告

Appending New Data^[16]附加新数据[16]

Before you start creating very complex spreadsheets, have a quick look at an example of how to append data to an existing spreadsheet.

在开始创建非常复杂的电子表格之前，请快速查看一个如何将数据附加到现有电子表格的示例。

Go back to the first example spreadsheet you created (hello_world.xlsx) and try opening it and appending some data to it, like this:

回到您创建的第一个示例电子表格(helloworld)。Xlsx)并尝试打开它并附加一些数据，像这样:

`from openpyxl import load_workbook

# Start by opening the spreadsheet and selecting the main sheet
workbook = load_workbook(filename="hello_world.xlsx")
sheet = workbook.active

# Write what you want into a specific cell
sheet["C1"] = "writing ;)"

# Save the spreadsheet
workbook.save(filename="hello_world_append.xlsx")`

Et voilà, if you open the new hello_world_append.xlsx spreadsheet, you’ll see the following change:

如果你打开新的 hello world append. xlsx 电子表格，你会看到如下变化:

Notice the additional writing ;) on cell C1.

注意单元格 c1上的额外写入;)。

Writing Excel Spreadsheets With openpyxl^[17]用 openpyxl 编写 Excel 电子表格[17]

There are a lot of different things you can write to a spreadsheet, from simple text or number values to complex formulas, charts, or even images.

你可以写很多不同的东西到电子表格，从简单的文本或数值到复杂的公式、图表，甚至图片。

Let’s start creating some spreadsheets!

让我们开始创建一些电子表格吧！

Creating a Simple Spreadsheet^[18]创建简单的电子表格[18]

Previously, you saw a very quick example of how to write “Hello world!” into a spreadsheet, so you can start with that:

之前，您看到了一个如何编写“ helloworld!”的快速示例转换成一个电子表格，你可以这样开始:

` 1from openpyxl import Workbook
 2
 3filename = "hello_world.xlsx"
 4
 5workbook = Workbook()
 6sheet = workbook.active
 7
 8sheet["A1"] = "hello"
 9sheet["B1"] = "world!" 10
11workbook.save(filename=filename) `

The highlighted lines in the code above are the most important ones for writing. In the code, you can see that:

上面代码中突出显示的行是最重要的。在代码中，你可以看到:

Line 5 shows you how to create a new empty workbook.

第5行向您展示了如何创建一个新的空工作簿。
Lines 8 and 9 show you how to add data to specific cells.

第8行和第9行向您展示了如何向特定单元格添加数据。
Line 11 shows you how to save the spreadsheet when you’re done.

第11行向您展示了完成后如何保存电子表格。

Even though these lines above can be straightforward, it’s still good to know them well for when things get a bit more complicated.

尽管上面的这些线条可能很简单，但是当事情变得有点复杂时，了解它们还是很好的。

Note: You’ll be using the hello_world.xlsx spreadsheet for some of the upcoming examples, so keep it handy.

注意: 在接下来的一些示例中，您将使用 helloworld.xlsx 电子表格，所以请随身携带。

One thing you can do to help with coming code examples is add the following method to your Python file or console:

对于即将到来的代码示例，你可以做的一件事就是将下面的方法添加到你的 Python 文件或控制台中:

`>>> def print_rows():
...     for row in sheet.iter_rows(values_only=True):
...         print(row)`

It makes it easier to print all of your spreadsheet values by just calling print_rows().

通过调用 print _ rows () ，可以更容易地打印所有电子表格值。

Remove ads

删除广告

Basic Spreadsheet Operations^[19]基本电子表格操作[19]

Before you get into the more advanced topics, it’s good for you to know how to manage the most simple elements of a spreadsheet.

在进入更高级的主题之前，最好了解如何管理电子表格中最简单的元素。

Adding and Updating Cell Values^[20]添加和更新单元格值[20]

You already learned how to add values to a spreadsheet like this:

您已经学习了如何向电子表格添加值，如下所示:

`>>> sheet["A1"] = "value"`

There’s another way you can do this, by first selecting a cell and then changing its value:

还有另一种方法可以做到这一点，首先选择一个单元格，然后更改其值:

`>>> cell = sheet["A1"]
>>> cell

  

>>> cell.value
'hello'

>>> cell.value = "hey"
>>> cell.value
'hey'`

The new value is only stored into the spreadsheet once you call workbook.save().

只有在调用 workbook.save ()时，新值才会存储到电子表格中。

The openpyxl creates a cell when adding a value, if that cell didn’t exist before:

如果这个单元格之前不存在，openpyxl 在添加值时会创建一个单元格:

`>>> # Before, our spreadsheet has only 1 row
>>> print_rows()
('hello', 'world!')

>>> # Try adding a value to row 10
>>> sheet["B10"] = "test"
>>> print_rows()
('hello', 'world!')
(None, None)
(None, None)
(None, None)
(None, None)
(None, None)
(None, None)
(None, None)
(None, None)
(None, 'test')`

As you can see, when trying to add a value to cell B10, you end up with a tuple with 10 rows, just so you can have that test value.

正如您可以看到的，在尝试向单元格 b10添加值时，最终会得到一个包含10行的 tuple，这样就可以得到该测试值。

Managing Rows and Columns^[21]管理行和列[21]

One of the most common things you have to do when manipulating spreadsheets is adding or removing rows and columns. The openpyxl package allows you to do that in a very straightforward way by using the methods:

在操作电子表格时，最常见的事情之一就是添加或删除行和列。Openpyxl 包允许你通过以下方法以非常简单的方式实现:

.insert_rows()
.delete_rows()
.insert_cols()
.delete_cols()

Every single one of those methods can receive two arguments:

这些方法中的每一个都可以接收两个参数:

idx
amount

Using our basic hello_world.xlsx example again, let’s see how these methods work:

再次使用我们的 hello world. xlsx 基本例子，让我们看看这些方法是如何工作的:

`>>> print_rows()
('hello', 'world!')

>>> # Insert a column before the existing column 1 ("A")
>>> sheet.insert_cols(idx=1)
>>> print_rows()
(None, 'hello', 'world!')

>>> # Insert 5 columns between column 2 ("B") and 3 ("C")
>>> sheet.insert_cols(idx=3, amount=5)
>>> print_rows()
(None, 'hello', None, None, None, None, None, 'world!')

>>> # Delete the created columns
>>> sheet.delete_cols(idx=3, amount=5)
>>> sheet.delete_cols(idx=1)
>>> print_rows()
('hello', 'world!')

>>> # Insert a new row in the beginning
>>> sheet.insert_rows(idx=1)
>>> print_rows()
(None, None)
('hello', 'world!')

>>> # Insert 3 new rows in the beginning
>>> sheet.insert_rows(idx=1, amount=3)
>>> print_rows()
(None, None)
(None, None)
(None, None)
(None, None)
('hello', 'world!')

>>> # Delete the first 4 rows
>>> sheet.delete_rows(idx=1, amount=4)
>>> print_rows()
('hello', 'world!')`

The only thing you need to remember is that when inserting new data (rows or columns), the insertion happens before the idx parameter.

惟一需要记住的是，当插入新数据(行或列)时，插入发生在 idx 参数之前。

So, if you do insert_rows(1), it inserts a new row before the existing first row.

因此，如果插入 _ rows (1) ，它将在现有的第一行之前插入一个新行。

It’s the same for columns: when you call insert_cols(2), it inserts a new column right before the already existing second column (B).

对于列也是如此: 当您调用 insert _ cols (2)时，它会在已经存在的第二列(b)之前插入一个新列。

However, when deleting rows or columns, .delete_... deletes data starting from the index passed as an argument.

但是，当删除行或列时，. delete _... 会从作为参数传递的索引开始删除数据。

For example, when doing delete_rows(2) it deletes row 2, and when doing delete_cols(3) it deletes the third column (C).

例如，当执行 delete _ rows (2)时，它删除第2行，当执行 delete _ cols (3)时，它删除第三列(c)。

Managing Sheets^[22]管理表[22]

Sheet management is also one of those things you might need to know, even though it might be something that you don’t use that often.

工作表管理也是你需要知道的事情之一，即使它可能是你不经常使用的东西。

If you look back at the code examples from this tutorial, you’ll notice the following recurring piece of code:

如果你回头看看本教程中的代码示例，你会注意到以下重复出现的代码片段:

`sheet = workbook.active`

This is the way to select the default sheet from a spreadsheet. However, if you’re opening a spreadsheet with multiple sheets, then you can always select a specific one like this:

这是从电子表格中选择默认工作表的方法。然而，如果你打开一个有多个工作表的电子表格，那么你总是可以像这样选择一个特定的工作表:

`>>> # Let's say you have two sheets: "Products" and "Company Sales"
>>> workbook.sheetnames
['Products', 'Company Sales']

>>> # You can select a sheet using its title
>>> products_sheet = workbook["Products"]
>>> sales_sheet = workbook["Company Sales"]`

You can also change a sheet title very easily:

你也可以很容易地改变一个页面标题:

`>>> workbook.sheetnames
['Products', 'Company Sales']

>>> products_sheet = workbook["Products"]
>>> products_sheet.title = "New Products"

>>> workbook.sheetnames
['New Products', 'Company Sales']`

If you want to create or delete sheets, then you can also do that with .create_sheet() and .remove():

如果您想要创建或删除工作表，那么您也可以使用. create _ sheet ()和. remove () :

`>>> workbook.sheetnames
['Products', 'Company Sales']

>>> operations_sheet = workbook.create_sheet("Operations")
>>> workbook.sheetnames
['Products', 'Company Sales', 'Operations']

>>> # You can also define the position to create the sheet at
>>> hr_sheet = workbook.create_sheet("HR", 0)
>>> workbook.sheetnames
['HR', 'Products', 'Company Sales', 'Operations']

>>> # To remove them, just pass the sheet as an argument to the .remove()
>>> workbook.remove(operations_sheet)
>>> workbook.sheetnames
['HR', 'Products', 'Company Sales']

>>> workbook.remove(hr_sheet)
>>> workbook.sheetnames
['Products', 'Company Sales']`

One other thing you can do is make duplicates of a sheet using copy_worksheet():

另外一件你可以做的事情是使用 copy _ worksheet ()复制工作表的副本:

`>>> workbook.sheetnames
['Products', 'Company Sales']

>>> products_sheet = workbook["Products"]
>>> workbook.copy_worksheet(products_sheet)

  

>>> workbook.sheetnames
['Products', 'Company Sales', 'Products Copy']`

If you open your spreadsheet after saving the above code, you’ll notice that the sheet Products Copy is a duplicate of the sheet Products.

如果您在保存以上代码之后打开电子表格，您将注意到“产品副本”表是“产品副本”表的副本。

Freezing Rows and Columns^[23]冻结行和柱[23]

Something that you might want to do when working with big spreadsheets is to freeze a few rows or columns, so they remain visible when you scroll right or down.

在处理大型电子表格时，您可能希望冻结一些行或列，这样当您向右或向下滚动时，它们仍然可见。

Freezing data allows you to keep an eye on important rows or columns, regardless of where you scroll in the spreadsheet.

冻结数据使您可以密切关注重要的行或列，无论您在电子表格中的哪个位置滚动。

Again, openpyxl also has a way to accomplish this by using the worksheet freeze_panes attribute. For this example, go back to our sample.xlsx spreadsheet and try doing the following:

同样，openpyxl 也可以通过使用工作表 freeze _ panes 属性来实现这一点。对于这个例子，回到我们的 sample.xlsx 电子表格并尝试执行以下操作:

`>>> workbook = load_workbook(filename="sample.xlsx")
>>> sheet = workbook.active
>>> sheet.freeze_panes = "C2"
>>> workbook.save("sample_frozen.xlsx")`

If you open the sample_frozen.xlsx spreadsheet in your favorite spreadsheet editor, you’ll notice that row 1 and columns A and B are frozen and are always visible no matter where you navigate within the spreadsheet.

如果你打开冰冻的样品。在您最喜欢的电子表格编辑器中使用 xlsx 电子表格时，您会注意到行1和列 a 和列 b 都已冻结，并且无论您在电子表格中的哪个位置导航都始终可见。

This feature is handy, for example, to keep headers within sight, so you always know what each column represents.

例如，这个特性很方便，可以将标题保持在视线范围内，因此您总是知道每个列代表什么。

Here’s how it looks in the editor:

下面是它在编辑器中的样子:

Notice how you’re at the end of the spreadsheet, and yet, you can see both row 1 and columns A and B.

注意你是如何在电子表格的末尾，然而，你可以看到行1和列 a 和列 b。

Adding Filters^[24]添加过滤器[24]

You can use openpyxl to add filters and sorts to your spreadsheet. However, when you open the spreadsheet, the data won’t be rearranged according to these sorts and filters.

您可以使用 openpyxl 向电子表格添加过滤器和排序。但是，当您打开电子表格时，数据不会根据这些排序和过滤器重新排列。

At first, this might seem like a pretty useless feature, but when you’re programmatically creating a spreadsheet that is going to be sent and used by somebody else, it’s still nice to at least create the filters and allow people to use it afterward.

一开始，这看起来似乎是一个非常无用的功能，但是当你通过编程创建一个电子表格，它将被其他人发送和使用时，至少创建一个过滤器并允许人们在之后使用它还是很好的。

The code below is an example of how you would add some filters to our existing sample.xlsx spreadsheet:

下面的代码是一个示例，说明了如何为现有的 sample.xlsx 电子表格添加一些过滤器:

`>>> # Check the used spreadsheet space using the attribute "dimensions"
>>> sheet.dimensions
'A1:O100'

>>> sheet.auto_filter.ref = "A1:O100"
>>> workbook.save(filename="sample_with_filters.xlsx")`

You should now see the filters created when opening the spreadsheet in your editor:

现在你应该看到在编辑器中打开电子表格时创建的过滤器:

You don’t have to use sheet.dimensions if you know precisely which part of the spreadsheet you want to apply filters to.

如果您准确地知道要将过滤器应用到电子表格的哪一部分，则不必使用 sheet.dimensions。

Remove ads

删除广告

Adding Formulas^[25]加入公式[25]

Formulas (or formulae) are one of the most powerful features of spreadsheets.

公式(或公式)是电子表格最强大的功能之一。

They gives you the power to apply specific mathematical equations to a range of cells. Using formulas with openpyxl is as simple as editing the value of a cell.

它们使你有能力将特定的数学方程式应用到一系列的单元格中。使用 openpyxl 的公式就像编辑单元格的值一样简单。

You can see the list of formulas supported by openpyxl:

你可以看到 openpyxl 支持的公式列表:

`>>> from openpyxl.utils import FORMULAE
>>> FORMULAE
frozenset({'ABS',
 'ACCRINT',
 'ACCRINTM',
 'ACOS',
 'ACOSH',
 'AMORDEGRC',
 'AMORLINC',
 'AND',
 ...
 'YEARFRAC',
 'YIELD',
 'YIELDDISC',
 'YIELDMAT',
 'ZTEST'})`

Let’s add some formulas to our sample.xlsx spreadsheet.

让我们在 sample.xlsx 电子表格中添加一些公式。

Starting with something easy, let’s check the average star rating for the 99 reviews within the spreadsheet:

从简单的事情开始，让我们检查一下电子表格中99条评论的平均星级评分:

`>>> # Star rating is column "H"
>>> sheet["P2"] = "=AVERAGE(H2:H100)"
>>> workbook.save(filename="sample_formulas.xlsx")`

If you open the spreadsheet now and go to cell P2, you should see that its value is: 4.18181818181818. Have a look in the editor:

如果现在打开电子表格并进入 cell P2，您将看到它的值是: 4.18181818181818181818。看看编辑:

You can use the same methodology to add any formulas to your spreadsheet. For example, let’s count the number of reviews that had helpful votes:

您可以使用相同的方法将任何公式添加到电子表格中。例如，让我们来计算一下有助于投票的评论的数量:

`>>> # The helpful votes are counted on column "I"
>>> sheet["P3"] = '=COUNTIF(I2:I100, ">0")'
>>> workbook.save(filename="sample_formulas.xlsx")`

You should get the number 21 on your P3 spreadsheet cell like so:

你应该像这样在你的 p3电子表格单元格中得到数字21:

You’ll have to make sure that the strings within a formula are always in double quotes, so you either have to use single quotes around the formula like in the example above or you’ll have to escape the double quotes inside the formula: "=COUNTIF(I2:I100, \">0\")".

您必须确保公式中的字符串始终处于双引号中，因此要么必须像上面的例子那样在公式周围使用单引号，要么必须对公式中的双引号进行转义: “ = COUNTIF (I2: I100,”> 0”)。

There are a ton of other formulas you can add to your spreadsheet using the same procedure you tried above. Give it a go yourself!

还有许多其他公式可以使用与上面相同的程序添加到电子表格中。你自己试试吧！

Adding Styles^[26]添加样式[26]

Even though styling a spreadsheet might not be something you would do every day, it’s still good to know how to do it.

尽管设计一个电子表格可能不是你每天都会做的事情，但是知道如何做还是很好的。

Using openpyxl, you can apply multiple styling options to your spreadsheet, including fonts, borders, colors, and so on. Have a look at the openpyxl documentation to learn more.

使用 openpyxl，您可以对电子表格应用多种样式选项，包括字体、边框、颜色等等。查看 openpyxl 文档了解更多信息。

You can also choose to either apply a style directly to a cell or create a template and reuse it to apply styles to multiple cells.

您还可以选择直接将样式应用于单元格，或者创建模板并重用它来将样式应用于多个单元格。

Let’s start by having a look at simple cell styling, using our sample.xlsx again as the base spreadsheet:

让我们首先看一下简单的单元格样式，再次使用 sample.xlsx 作为基本电子表格:

`>>> # Import necessary style classes
>>> from openpyxl.styles import Font, Color, Alignment, Border, Side

>>> # Create a few styles
>>> bold_font = Font(bold=True)
>>> big_red_text = Font(color="00FF0000", size=20)
>>> center_aligned_text = Alignment(horizontal="center")
>>> double_border_side = Side(border_style="double")
>>> square_border = Border(top=double_border_side,
...                        right=double_border_side,
...                        bottom=double_border_side,
...                        left=double_border_side)

>>> # Style some cells!
>>> sheet["A2"].font = bold_font
>>> sheet["A3"].font = big_red_text
>>> sheet["A4"].alignment = center_aligned_text
>>> sheet["A5"].border = square_border
>>> workbook.save(filename="sample_styles.xlsx")`

If you open your spreadsheet now, you should see quite a few different styles on the first 5 cells of column A:

如果你现在打开你的电子表格，你会看到在列 a 的前5个单元格上有很多不同的样式:

There you go. You got:

这就对了，你有:

A2 with the text in bold

A2，文本以粗体显示
A3 with the text in red and bigger font size

A3，文本为红色，字体大一些
A4 with the text centered

文本居中的 A4
A5 with a square border around the text

A5，文本周围有一个正方形边框

Note: For the colors, you can also use HEX codes instead by doing Font(color="C70E0F").

注意: 对于颜色，你也可以使用 HEX 代码来代替 Font (color = “ C70E0F”)。

You can also combine styles by simply adding them to the cell at the same time:

你也可以通过简单的同时将样式添加到单元格中来组合样式:

`>>> # Reusing the same styles from the example above
>>> sheet["A6"].alignment = center_aligned_text
>>> sheet["A6"].font = big_red_text
>>> sheet["A6"].border = square_border
>>> workbook.save(filename="sample_styles.xlsx")`

Have a look at cell A6 here:

看看这里的细胞 A6:

When you want to apply multiple styles to one or several cells, you can use a NamedStyle class instead, which is like a style template that you can use over and over again. Have a look at the example below:

当您希望将多个样式应用于一个或多个单元格时，可以使用 NamedStyle 类，这类似于样式模板，可以反复使用。看看下面的例子:

`>>> from openpyxl.styles import NamedStyle

>>> # Let's create a style template for the header row
>>> header = NamedStyle(name="header")
>>> header.font = Font(bold=True)
>>> header.border = Border(bottom=Side(border_style="thin"))
>>> header.alignment = Alignment(horizontal="center", vertical="center")

>>> # Now let's apply this to all first row (header) cells
>>> header_row = sheet[1]
>>> for cell in header_row:
...     cell.style = header

>>> workbook.save(filename="sample_styles.xlsx")`

If you open the spreadsheet now, you should see that its first row is bold, the text is aligned to the center, and there’s a small bottom border! Have a look below:

如果现在打开电子表格，您将看到它的第一行是粗体的，文本与中心对齐，并且有一个小的底部边框！请看下面的内容:

As you saw above, there are many options when it comes to styling, and it depends on the use case, so feel free to check openpyxl documentation and see what other things you can do.

正如您在上面看到的，在样式化方面有许多选项，这取决于用例，所以您可以随意查看 openpyxl 文档，看看还能做些什么。

Remove ads

删除广告

Conditional Formatting^[27]条件格式[27]

This feature is one of my personal favorites when it comes to adding styles to a spreadsheet.

这个特性是我个人最喜欢的电子表格样式添加功能之一。

It’s a much more powerful approach to styling because it dynamically applies styles according to how the data in the spreadsheet changes.

这是一种更加强大的样式化方法，因为它根据电子表格中的数据的变化情况动态地应用样式。

In a nutshell, conditional formatting allows you to specify a list of styles to apply to a cell (or cell range) according to specific conditions.

简而言之，条件格式允许您根据特定条件指定要应用于单元格(或单元格范围)的样式列表。

For example, a widespread use case is to have a balance sheet where all the negative totals are in red, and the positive ones are in green. This formatting makes it much more efficient to spot good vs bad periods.

例如，一个广泛使用的例子是，资产负债表中所有的负总额都是红色的，而正总额是绿色的。这种格式化使得识别好句号和坏句号更加有效。

Without further ado, let’s pick our favorite spreadsheet—sample.xlsx—and add some conditional formatting.

闲话少说，让我们选择我们最喜欢的电子表格ー sample.xlsx ー并添加一些条件格式。

You can start by adding a simple one that adds a red background to all reviews with less than 3 stars:

你可以从添加一个简单的方法开始，在所有评论中添加一个红色背景，小于3星:

`>>> from openpyxl.styles import PatternFill
>>> from openpyxl.styles.differential import DifferentialStyle
>>> from openpyxl.formatting.rule import Rule

>>> red_background = PatternFill(fgColor="00FF0000")
>>> diff_style = DifferentialStyle(fill=red_background)
>>> rule = Rule(type="expression", dxf=diff_style)
>>> rule.formula = ["$H1<3"]
>>> sheet.conditional_formatting.add("A1:O100", rule)
>>> workbook.save("sample_conditional_formatting.xlsx")`

Now you’ll see all the reviews with a star rating below 3 marked with a red background:

现在你会看到所有的评论，星级评分低于3，红色背景标记:

Code-wise, the only things that are new here are the objects DifferentialStyle and Rule:

在代码方面，这里唯一的新东西是对象的不同风格和规则:

DifferentialStyle is quite similar to NamedStyle, which you already saw above, and it’s used to aggregate multiple styles such as fonts, borders, alignment, and so forth.

与上面提到的 NamedStyle 非常类似，它用于聚合多种样式，如字体、边框、对齐方式等。
Rule is responsible for selecting the cells and applying the styles if the cells match the rule’s logic.

如果单元格符合规则的逻辑，则规则负责选择单元格并应用样式。

Using a Rule object, you can create numerous conditional formatting scenarios.

使用 Rule 对象，可以创建许多条件格式化方案。

However, for simplicity sake, the openpyxl package offers 3 built-in formats that make it easier to create a few common conditional formatting patterns. These built-ins are:

然而，为了简单起见，openpyxl 包提供了3种内置格式，使得创建一些常见的条件格式模式变得更加容易。这些内置设备包括:

ColorScale
IconSet
DataBar

The ColorScale gives you the ability to create color gradients:

赋予你创建颜色渐变的能力:

`>>> from openpyxl.formatting.rule import ColorScaleRule
>>> color_scale_rule = ColorScaleRule(start_type="min",
...                                   start_color="00FF0000",  # Red
...                                   end_type="max",
...                                   end_color="0000FF00")  # Green

>>> # Again, let's add this gradient to the star ratings, column "H"
>>> sheet.conditional_formatting.add("H2:H100", color_scale_rule)
>>> workbook.save(filename="sample_conditional_formatting_color_scale.xlsx")`

Now you should see a color gradient on column H, from red to green, according to the star rating:

现在你应该可以看到 h 栏的颜色渐变，从红色到绿色，根据星级:

You can also add a third color and make two gradients instead:

你也可以添加第三种颜色，然后做两个渐变:

`>>> from openpyxl.formatting.rule import ColorScaleRule
>>> color_scale_rule = ColorScaleRule(start_type="num",
...                                   start_value=1,
...                                   start_color="00FF0000",  # Red
...                                   mid_type="num",
...                                   mid_value=3,
...                                   mid_color="00FFFF00",  # Yellow
...                                   end_type="num",
...                                   end_value=5,
...                                   end_color="0000FF00")  # Green

>>> # Again, let's add this gradient to the star ratings, column "H"
>>> sheet.conditional_formatting.add("H2:H100", color_scale_rule)
>>> workbook.save(filename="sample_conditional_formatting_color_scale_3.xlsx")`

This time, you’ll notice that star ratings between 1 and 3 have a gradient from red to yellow, and star ratings between 3 and 5 have a gradient from yellow to green:

这一次，你会注意到1到3之间的星级评分有一个从红色到黄色的梯度，3到5之间的星级评分有一个从黄色到绿色的梯度:

The IconSet allows you to add an icon to the cell according to its value:

图标集允许你根据单元格的值添加一个图标:

`>>> from openpyxl.formatting.rule import IconSetRule

>>> icon_set_rule = IconSetRule("5Arrows", "num", [1, 2, 3, 4, 5])
>>> sheet.conditional_formatting.add("H2:H100", icon_set_rule)
>>> workbook.save("sample_conditional_formatting_icon_set.xlsx")`

You’ll see a colored arrow next to the star rating. This arrow is red and points down when the value of the cell is 1 and, as the rating gets better, the arrow starts pointing up and becomes green:

你会看到一个彩色箭头旁边的星级评级。这个箭头是红色的，当单元格的值为1时它会向下指，而且，随着等级的提高，这个箭头开始指向上并变成绿色:

The openpyxl package has a full list of other icons you can use, besides the arrow.

Openpyxl 包中除了箭头之外，还有其他可以使用的图标的完整列表。

Finally, the DataBar allows you to create progress bars:

最后，DataBar 允许你创建进度条:

`>>> from openpyxl.formatting.rule import DataBarRule

>>> data_bar_rule = DataBarRule(start_type="num",
...                             start_value=1,
...                             end_type="num",
...                             end_value="5",
...                             color="0000FF00")  # Green
>>> sheet.conditional_formatting.add("H2:H100", data_bar_rule)
>>> workbook.save("sample_conditional_formatting_data_bar.xlsx")`

You’ll now see a green progress bar that gets fuller the closer the star rating is to the number 5:

现在你会看到一个绿色的进度条，星级越接近数字5，进度条就越满:

As you can see, there are a lot of cool things you can do with conditional formatting.

如您所见，使用条件格式可以做很多很酷的事情。

Here, you saw only a few examples of what you can achieve with it, but check the openpyxl documentation to see a bunch of other options.

在这里，您只看到了几个使用它可以实现的示例，但是查看 openpyxl 文档可以看到一大堆其他选项。

Remove ads

删除广告

Adding Images^[28]添加图片[28]

Even though images are not something that you’ll often see in a spreadsheet, it’s quite cool to be able to add them. Maybe you can use it for branding purposes or to make spreadsheets more personal.

尽管图片不是你经常在电子表格中看到的东西，但是能够添加它们还是很酷的。也许你可以用它来建立品牌或者使电子表格更加个性化。

To be able to load images to a spreadsheet using openpyxl, you’ll have to install Pillow:

要使用 openpyxl 将图片加载到电子表格中，你必须安装 Pillow:

`$ pip install Pillow`

Apart from that, you’ll also need an image. For this example, you can grab the Real Python logo below and convert it from .webp to .png using an online converter such as cloudconvert.com, save the final file as logo.png, and copy it to the root folder where you’re running your examples:

除此之外，你还需要一张图片。对于这个例子，您可以获取下面的 Real Python 徽标并从。对... 使用 webp。Png 使用在线转换器，比如 cloudconvert.com ，将最终文件保存为 logo.png，并将其复制到运行示例的根文件夹:

Afterward, this is the code you need to import that image into the hello_word.xlsx spreadsheet:

然后，这是你需要导入图像到 hello _ word. xlsx 电子表格的代码:

`from openpyxl import load_workbook
from openpyxl.drawing.image import Image

# Let's use the hello_world spreadsheet since it has less data
workbook = load_workbook(filename="hello_world.xlsx")
sheet = workbook.active

logo = Image("logo.png")

# A bit of resizing to not fill the whole spreadsheet with the logo
logo.height = 150
logo.width = 150

sheet.add_image(logo, "A3")
workbook.save(filename="hello_world_logo.xlsx")`

You have an image on your spreadsheet! Here it is:

你的电子表格上有一张图片，下面是:

The image’s left top corner is on the cell you chose, in this case, A3.

图像的左上角位于您选择的单元格上，在本例中为 A3。

Adding Pretty Charts^[29]添加漂亮的图表[29]

Another powerful thing you can do with spreadsheets is create an incredible variety of charts.

你可以用电子表格做的另一个强大的事情是创建各种各样令人难以置信的图表。

Charts are a great way to visualize and understand loads of data quickly. There are a lot of different chart types: bar chart, pie chart, line chart, and so on. openpyxl has support for a lot of them.

图表是快速可视化和理解大量数据的好方法。有很多不同的图表类型: 柱状图、饼状图、折线图等等。Openpyxl 有很多支持者。

Here, you’ll see only a couple of examples of charts because the theory behind it is the same for every single chart type:

在这里，你只能看到几个图表的例子，因为它背后的理论对于每一种图表类型都是相同的:

Note: A few of the chart types that openpyxl currently doesn’t have support for are Funnel, Gantt, Pareto, Treemap, Waterfall, Map, and Sunburst.

注意: openpyxl 目前不支持的图表类型有漏斗、甘特、 Pareto、 Treemap、瀑布、地图和 Sunburst。

For any chart you want to build, you’ll need to define the chart type: BarChart, LineChart, and so forth, plus the data to be used for the chart, which is called Reference.

对于要构建的任何图表，都需要定义图表类型: BarChart、 LineChart 等等，加上用于图表的数据，称为 Reference。

Before you can build your chart, you need to define what data you want to see represented in it. Sometimes, you can use the dataset as is, but other times you need to massage the data a bit to get additional information.

在构建图表之前，需要定义希望在其中看到哪些数据。有时，您可以按原样使用数据集，但有时您需要稍微修改数据以获得更多信息。

Let’s start by building a new workbook with some sample data:

让我们用一些示例数据来构建一个新的工作簿:

`1from openpyxl import Workbook
 2from openpyxl.chart import BarChart, Reference
 3
 4workbook = Workbook()
 5sheet = workbook.active
 6
 7# Let's create some sample sales data
 8rows = [
 9    ["Product", "Online", "Store"],
10    [1, 30, 45],
11    [2, 40, 30],
12    [3, 40, 25],
13    [4, 50, 30],
14    [5, 30, 25],
15    [6, 25, 35],
16    [7, 20, 40],
17]
18
19for row in rows:
20    sheet.append(row)`

Now you’re going to start by creating a bar chart that displays the total number of sales per product:

现在你要开始创建一个条形图来显示每个产品的总销售量:

`22chart = BarChart()
23data = Reference(worksheet=sheet,
24                 min_row=1,
25                 max_row=8,
26                 min_col=2,
27                 max_col=3)
28
29chart.add_data(data, titles_from_data=True)
30sheet.add_chart(chart, "E2")
31
32workbook.save("chart.xlsx")`

There you have it. Below, you can see a very straightforward bar chart showing the difference between online product sales online and in-store product sales:

这就对了。下面，你可以看到一个非常简单的条形图，显示了在线产品销售和店内产品销售的区别:

Like with images, the top left corner of the chart is on the cell you added the chart to. In your case, it was on cell E2.

与图像一样，图表的左上角位于您添加图表的单元格上。就你的情况而言，是在 e2单元。

Note: Depending on whether you’re using Microsoft Excel or an open-source alternative (LibreOffice or OpenOffice), the chart might look slightly different.

注意: 根据您使用的是 microsoftexcel 还是开源替代软件(LibreOffice 或 OpenOffice) ，图表看起来可能略有不同。

Try creating a line chart instead, changing the data a bit:

试着创建一个折线图，稍微改变一下数据:

`1import random
 2from openpyxl import Workbook
 3from openpyxl.chart import LineChart, Reference
 4
 5workbook = Workbook()
 6sheet = workbook.active
 7
 8# Let's create some sample sales data
 9rows = [
10    ["", "January", "February", "March", "April",
11    "May", "June", "July", "August", "September",
12     "October", "November", "December"],
13    [1, ],
14    [2, ],
15    [3, ],
16]
17
18for row in rows:
19    sheet.append(row)
20
21for row in sheet.iter_rows(min_row=2,
22                           max_row=4,
23                           min_col=2,
24                           max_col=13):
25    for cell in row:
26        cell.value = random.randrange(5, 100)`

With the above code, you’ll be able to generate some random data regarding the sales of 3 different products across a whole year.

使用以上代码，您将能够生成一些关于全年3种不同产品销售情况的随机数据。

Once that’s done, you can very easily create a line chart with the following code:

一旦完成，你可以很容易地用下面的代码创建一个折线图:

`28chart = LineChart()
29data = Reference(worksheet=sheet,
30                 min_row=2,
31                 max_row=4,
32                 min_col=1,
33                 max_col=13)
34
35chart.add_data(data, from_rows=True, titles_from_data=True)
36sheet.add_chart(chart, "C6")
37
38workbook.save("line_chart.xlsx")`

Here’s the outcome of the above piece of code:

下面是上面这段代码的结果:

One thing to keep in mind here is the fact that you’re using from_rows=True when adding the data. This argument makes the chart plot row by row instead of column by column.

这里需要记住的一点是，在添加数据时使用 from _ rows = True。此参数使图表逐行绘制，而不是逐列绘制。

In your sample data, you see that each product has a row with 12 values (1 column per month). That’s why you use from_rows. If you don’t pass that argument, by default, the chart tries to plot by column, and you’ll get a month-by-month comparison of sales.

在您的示例数据中，您可以看到每个产品都有一行包含12个值(每月一列)。这就是为什么你使用从行。如果您不传递该参数，默认情况下，图表会尝试按列绘制图表，您将得到销售额的逐月比较。

Another difference that has to do with the above argument change is the fact that our Reference now starts from the first column, min_col=1, instead of the second one. This change is needed because the chart now expects the first column to have the titles.

另一个与上述参数更改有关的区别是，Reference 现在从第一列 min _ col = 1开始，而不是从第二列开始。这个更改是必要的，因为图表现在期望第一列包含标题。

There are a couple of other things you can also change regarding the style of the chart. For example, you can add specific categories to the chart:

关于图表的风格，你还可以改变一些其他的东西。例如，你可以在图表中添加特定的类别:

`cats = Reference(worksheet=sheet,
                 min_row=1,
                 max_row=1,
                 min_col=2,
                 max_col=13)
chart.set_categories(cats)`

Add this piece of code before saving the workbook, and you should see the month names appearing instead of numbers:

在保存工作簿之前添加这段代码，您会看到月份名称而不是数字:

Code-wise, this is a minimal change. But in terms of the readability of the spreadsheet, this makes it much easier for someone to open the spreadsheet and understand the chart straight away.

代码方面，这是一个最小的变化。但就电子表格的可读性而言，这使人们更容易打开电子表格并立即理解图表。

Another thing you can do to improve the chart readability is to add an axis. You can do it using the attributes x_axis and y_axis:

另一件可以提高图表可读性的事情是添加一个轴。你可以使用属性 x 轴和 y 轴:

`chart.x_axis.title = "Months"
chart.y_axis.title = "Sales (per unit)"`

This will generate a spreadsheet like the below one:

这将生成一个如下所示的电子表格:

As you can see, small changes like the above make reading your chart a much easier and quicker task.

正如你所看到的，像上面这样的小改变使得阅读你的图表变得更加容易和快捷。

There is also a way to style your chart by using Excel’s default ChartStyle property. In this case, you have to choose a number between 1 and 48. Depending on your choice, the colors of your chart change as well:

还有一种使用 Excel 的默认 ChartStyle 属性设置图表样式的方法。在这种情况下，你必须在1和48之间选择一个数字。根据你的选择，图表的颜色也会改变:

`# You can play with this by choosing any number between 1 and 48
chart.style = 24`

With the style selected above, all lines have some shade of orange:

以上选择的风格，所有线条都有一些橙色的阴影:

There is no clear documentation on what each style number looks like, but this spreadsheet has a few examples of the styles available.

没有关于每个样式编号看起来是什么样子的明确文档，但是这个电子表格有一些可用样式的示例。

Complete Code ExampleShow/Hide

完成代码示例/如何/隐藏

Here’s the full code used to generate the line chart with categories, axis titles, and style:

下面是用于生成包含类别、轴标题和样式的折线图的完整代码:

`import random
from openpyxl import Workbook
from openpyxl.chart import LineChart, Reference

workbook = Workbook()
sheet = workbook.active

# Let's create some sample sales data
rows = [
    ["", "January", "February", "March", "April",
    "May", "June", "July", "August", "September",
     "October", "November", "December"],
    [1, ],
    [2, ],
    [3, ],
]

for row in rows:
    sheet.append(row)

for row in sheet.iter_rows(min_row=2,
                           max_row=4,
                           min_col=2,
                           max_col=13):
    for cell in row:
        cell.value = random.randrange(5, 100)

# Create a LineChart and add the main data
chart = LineChart()
data = Reference(worksheet=sheet,
                           min_row=2,
                           max_row=4,
                           min_col=1,
                           max_col=13)
chart.add_data(data, titles_from_data=True, from_rows=True)

# Add categories to the chart
cats = Reference(worksheet=sheet,
                 min_row=1,
                 max_row=1,
                 min_col=2,
                 max_col=13)
chart.set_categories(cats)

# Rename the X and Y Axis
chart.x_axis.title = "Months"
chart.y_axis.title = "Sales (per unit)"

# Apply a specific Style
chart.style = 24

# Save!
sheet.add_chart(chart, "C6")
workbook.save("line_chart.xlsx")`

There are a lot more chart types and customization you can apply, so be sure to check out the package documentation on this if you need some specific formatting.

您可以应用更多的图表类型和自定义，因此，如果需要一些特定的格式，请确保查看关于此的包文档。

Remove ads

删除广告

Convert Python Classes to Excel Spreadsheet^[30]将 Python 类转换为 Excel 电子表格[30]

You already saw how to convert an Excel spreadsheet’s data into Python classes, but now let’s do the opposite.

您已经了解了如何将 Excel 电子表格的数据转换为 Python 类，但是现在让我们做相反的操作。

Let’s imagine you have a database and are using some Object-Relational Mapping (ORM) to map DB objects into Python classes. Now, you want to export those same objects into a spreadsheet.

让我们假设您有一个数据库，并且正在使用一些对象关系映射映射(ORM)将 DB 对象映射到 Python 类中。现在，您需要将这些对象导出到电子表格中。

Let’s assume the following data classes to represent the data coming from your database regarding product sales:

让我们假设下面的数据类表示来自你的数据库的产品销售数据:

`from dataclasses import dataclass
from typing import List

@dataclass
class Sale:
    quantity: int

@dataclass
class Product:
    id: str
    name: str
    sales: List[Sale]`

Now, let’s generate some random data, assuming the above classes are stored in a db_classes.py file:

现在，让我们生成一些随机数据，假设上面的类存储在 db _ classes.py 文件中:

`1import random
 2
 3# Ignore these for now. You'll use them in a sec ;)
 4from openpyxl import Workbook
 5from openpyxl.chart import LineChart, Reference
 6
 7from db_classes import Product, Sale
 8
 9products = []
10
11# Let's create 5 products
12for idx in range(1, 6):
13    sales = []
14
15    # Create 5 months of sales
16    for _ in range(5):
17        sale = Sale(quantity=random.randrange(5, 100))
18        sales.append(sale)
19
20    product = Product(id=str(idx),
21                      name="Product %s" % idx,
22                      sales=sales)
23    products.append(product)`

By running this piece of code, you should get 5 products with 5 months of sales with a random quantity of sales for each month.

通过运行这段代码，你应该可以得到5个月的销售量，每个月的销售量是随机的。

Now, to convert this into a spreadsheet, you need to iterate over the data and append it to the spreadsheet:

现在，为了将其转换成电子表格，你需要遍历数据并将其附加到电子表格中:

`25workbook = Workbook()
26sheet = workbook.active
27
28# Append column names first
29sheet.append(["Product ID", "Product Name", "Month 1",
30              "Month 2", "Month 3", "Month 4", "Month 5"])
31
32# Append the data
33for product in products:
34    data = [product.id, product.name]
35    for sale in product.sales:
36        data.append(sale.quantity)
37    sheet.append(data)`

That’s it. That should allow you to create a spreadsheet with some data coming from your database.

这样就可以创建一个电子表格，其中包含来自数据库的一些数据。

However, why not use some of that cool knowledge you gained recently to add a chart as well to display that data more visually?

然而，为什么不利用你最近学到的一些很酷的知识来添加一个图表来更直观地显示数据呢？

All right, then you could probably do something like this:

好吧，你可以这样做:

`38chart = LineChart()
39data = Reference(worksheet=sheet,
40                 min_row=2,
41                 max_row=6,
42                 min_col=2,
43                 max_col=7)
44
45chart.add_data(data, titles_from_data=True, from_rows=True)
46sheet.add_chart(chart, "B8")
47
48cats = Reference(worksheet=sheet,
49                 min_row=1,
50                 max_row=1,
51                 min_col=3,
52                 max_col=7)
53chart.set_categories(cats)
54
55chart.x_axis.title = "Months"
56chart.y_axis.title = "Sales (per unit)"
57
58workbook.save(filename="oop_sample.xlsx")`

Now we’re talking! Here’s a spreadsheet generated from database objects and with a chart and everything:

现在我们正在讨论这个问题! 这是一个从数据库对象生成的电子表格，里面有一个图表和所有的东西:

That’s a great way for you to wrap up your new knowledge of charts!

这是一个伟大的方式，为您总结您的新知识的图表！

Bonus: Working With Pandas^[31]额外收获: 与熊猫一起工作[31]

Even though you can use Pandas to handle Excel files, there are few things that you either can’t accomplish with Pandas or that you’d be better off just using openpyxl directly.

尽管您可以使用 Pandas 来处理 Excel 文件，但是有些事情您无法用 Pandas 完成，或者您最好直接使用 openpyxl。

For example, some of the advantages of using openpyxl are the ability to easily customize your spreadsheet with styles, conditional formatting, and such.

例如，使用 openpyxl 的一些优点是能够轻松地使用样式、条件格式等自定义电子表格。

But guess what, you don’t have to worry about picking. In fact, openpyxl has support for both converting data from a Pandas DataFrame into a workbook or the opposite, converting an openpyxl workbook into a Pandas DataFrame.

但是你猜怎么着，你不必担心选择。实际上，openpyxl 支持将 Pandas DataFrame 的数据转换为工作簿或相反的数据，将 openpyxl 工作簿转换为 Pandas DataFrame。

Note: If you’re new to Pandas, check our course on Pandas DataFrames beforehand.

注意: 如果你是熊猫新手，请提前查看我们的熊猫数据库课程。

First things first, remember to install the pandas package:

首先，记得安装 pandas 包:

`$ pip install pandas`

Then, let’s create a sample DataFrame:

然后，让我们创建一个示例 DataFrame:

`1import pandas as pd
 2
 3data = {
 4    "Product Name": ["Product 1", "Product 2"],
 5    "Sales Month 1": [10, 20],
 6    "Sales Month 2": [5, 35],
 7}
 8df = pd.DataFrame(data)`

Now that you have some data, you can use .dataframe_to_rows() to convert it from a DataFrame into a worksheet:

现在你有了一些数据，你可以使用. dataframe_to _ rows ()把它从一个 DataFrame 转换成一个工作表:

`10from openpyxl import Workbook
11from openpyxl.utils.dataframe import dataframe_to_rows
12
13workbook = Workbook()
14sheet = workbook.active
15
16for row in dataframe_to_rows(df, index=False, header=True):
17    sheet.append(row)
18
19workbook.save("pandas.xlsx")`

You should see a spreadsheet that looks like this:

你应该看到这样一个电子表格:

If you want to add the DataFrame’s index, you can change index=True, and it adds each row’s index into your spreadsheet.

如果要添加 DataFrame 的索引，可以更改 index = True，并将每一行的索引添加到电子表格中。

On the other hand, if you want to convert a spreadsheet into a DataFrame, you can also do it in a very straightforward way like so:

另一方面，如果你想把一个电子表格转换成一个 DataFrame，你也可以用一种非常简单的方法来做，比如:

`import pandas as pd
from openpyxl import load_workbook

workbook = load_workbook(filename="sample.xlsx")
sheet = workbook.active

values = sheet.values
df = pd.DataFrame(values)`

Alternatively, if you want to add the correct headers and use the review ID as the index, for example, then you can also do it like this instead:

或者，如果你想添加正确的标题并使用评论 ID 作为索引，你也可以这样做:

`import pandas as pd
from openpyxl import load_workbook
from mapping import REVIEW_ID

workbook = load_workbook(filename="sample.xlsx")
sheet = workbook.active

data = sheet.values

# Set the first row as the columns for the DataFrame
cols = next(data)
data = list(data)

# Set the field "review_id" as the indexes for each row
idx = [row[REVIEW_ID] for row in data]

df = pd.DataFrame(data, index=idx, columns=cols)`

Using indexes and columns allows you to access data from your DataFrame easily:

通过使用索引和列，您可以轻松地访问 DataFrame 中的数据:

`>>> df.columns
Index(['marketplace', 'customer_id', 'review_id', 'product_id',
 'product_parent', 'product_title', 'product_category', 'star_rating',
 'helpful_votes', 'total_votes', 'vine', 'verified_purchase',
 'review_headline', 'review_body', 'review_date'],
 dtype='object')

>>> # Get first 10 reviews' star rating
>>> df["star_rating"][:10]
R3O9SGZBVQBV76    5
RKH8BNC3L5DLF     5
R2HLE8WKZSU3NL    2
R31U3UH5AZ42LL    5
R2SV659OUJ945Y    4
RA51CP8TR5A2L     5
RB2Q7DLDN6TH6     5
R2RHFJV0UYBK3Y    1
R2Z6JOQ94LFHEP    5
RX27XIIWY5JPB     4
Name: star_rating, dtype: int64

>>> # Grab review with id "R2EQL1V1L6E0C9", using the index
>>> df.loc["R2EQL1V1L6E0C9"]
marketplace               US
customer_id         15305006
review_id     R2EQL1V1L6E0C9
product_id        B004LURNO6
product_parent     892860326
review_headline   Five Stars
review_body          Love it
review_date       2015-08-31
Name: R2EQL1V1L6E0C9, dtype: object`

There you go, whether you want to use openpyxl to prettify your Pandas dataset or use Pandas to do some hardcore algebra, you now know how to switch between both packages.

这就对了，不管你是想用 openpyxl 来美化你的 Pandas 数据集还是用 Pandas 来做一些核心代数，你现在知道如何在两个软件包之间切换了。

Remove ads

删除广告

Conclusion^[32]结论[32]

Phew, after that long read, you now know how to work with spreadsheets in Python! You can rely on openpyxl, your trustworthy companion, to:

呵呵，经过长时间的阅读，您现在知道如何使用 Python 处理电子表格了！你可以信赖 openpyxl，你值得信赖的伙伴，去:

Extract valuable information from spreadsheets in a Pythonic manner

以 python 的方式从电子表格中提取有价值的信息
Create your own spreadsheets, no matter the complexity level

创建自己的电子表格，无论其复杂程度如何
Add cool features such as conditional formatting or charts to your spreadsheets

在你的电子表格中添加一些很酷的功能，比如条件格式或者图表

There are a few other things you can do with openpyxl that might not have been covered in this tutorial, but you can always check the package’s official documentation website to learn more about it. You can even venture into checking its source code and improving the package further.

使用 openpyxl 还可以做一些本教程中可能没有涉及的事情，但是您可以随时查看包的官方文档网站以了解更多关于 openpyxl 的信息。您甚至可以冒险检查它的源代码并进一步改进包。

Feel free to leave any comments below if you have any questions, or if there’s any section you’d love to hear more about.

如果你有任何问题，请随意在下面留下评论，或者如果有任何部分你想了解更多。

Download Dataset: Click here to download the dataset for the openpyxl exercise you’ll be following in this tutorial.

下载数据集: 单击此处下载本教程中将要进行的 openpyxl 练习的数据集。

Mark as Completed

完成标记

Watch Now This tutorial has a related video course created by the Real Python team. Watch it together with the written tutorial to deepen your understanding: Editing Excel Spreadsheets in Python With openpyxl

本教程有一个由 Real Python 团队创建的相关视频课程。与编写的教程一起观看: 使用 openpyxl 在 Python 中编辑 Excel 电子表格

🐍 Python Tricks 💌

Python Tricks

Get a short & sweet Python Trick delivered to your inbox every couple of days. No spam ever. Unsubscribe any time. Curated by the Real Python team.

每隔几天，你就可以收到一封简短而甜蜜的 Python 技巧邮件。从来没有垃圾邮件。随时取消订阅。由 Real Python 团队策划。

Python Tricks Dictionary Merge 2. Python Tricks Dictionary Merge

About Pedro Pregueiro

关于 Pedro Pregueiro

Hi! My name is Pedro and I'm a Python developer who loves coding, burgers and playing guitar.

» More about Pedro

Each tutorial at Real Python is created by a team of developers so that it meets our high quality standards. The team members who worked on this tutorial are:

[

Aldren

](/team/asantos/)[

Joanna

](/team/jjablonski/)[

Mike

](/team/mdriscoll/)

Master Real-World Python Skills With Unlimited Access to Real Python

Join us and get access to hundreds of tutorials, hands-on video courses, and a community of expert Pythonistas:

Level Up Your Python Skills »

Master Real-World Python Skills
With Unlimited Access to Real Python

Join us and get access to hundreds of tutorials, hands-on video courses, and a community of expert Pythonistas:

Level Up Your Python Skills »

What Do You Think?

[Tweet](https://twitter.com/intent/tweet/?text=Check out this %23Python tutorial: A%20Guide%20to%20Excel%20Spreadsheets%20in%20Python%20With%20openpyxl by @realpython&url=https%3A//realpython.com/openpyxl-excel-spreadsheets-python/) Share [Email](mailto:?subject=Python article for you&body=Check out this Python tutorial:%0A%0AA%20Guide%20to%20Excel%20Spreadsheets%20in%20Python%20With%20openpyxl%0A%0Ahttps%3A//realpython.com/openpyxl-excel-spreadsheets-python/)

Real Python Comment Policy: The most useful comments are those written with the goal of learning from or helping out other readers—after reading the whole article and all the earlier comments. Complaints and insults generally won’t make the cut here.

What’s your #1 takeaway or favorite thing you learned? How are you going to put your newfound skills to use? Leave a comment below and let us know.

Keep Learning

Related Tutorial Categories: intermediate

Recommended Video Course: Editing Excel Spreadsheets in Python With openpyxl

Keep reading Real Python by creating a free account or signing in:

Continue »

Already have an account? Sign-In

All Tutorial Topics

advanced api basics best-practices community databases data-science devops django docker flask front-end gamedev gui intermediate machine-learning projects python testing tools web-dev web-scraping

Table of Contents

Before You Begin
Reading Excel Spreadsheets With openpyxl
Writing Excel Spreadsheets With openpyxl
Conclusion

Mark as Completed[Tweet](https://twitter.com/intent/tweet/?text=Check out this %23Python tutorial: A%20Guide%20to%20Excel%20Spreadsheets%20in%20Python%20With%20openpyxl by @realpython&url=https%3A//realpython.com/openpyxl-excel-spreadsheets-python/) Share [Email](mailto:?subject=Python article for you&body=Check out this Python tutorial:%0A%0AA%20Guide%20to%20Excel%20Spreadsheets%20in%20Python%20With%20openpyxl%0A%0Ahttps%3A//realpython.com/openpyxl-excel-spreadsheets-python/)

Recommended Video Course
Editing Excel Spreadsheets in Python With openpyxl

Almost there! Complete this form and click the button below to gain instant access:

× 13 Project Ideas for Intermediate Python Developers

Dataset for openpyxl Tutorial

Remove ads

© 2012–2022 Real Python ⋅ Newsletter ⋅ Podcast ⋅ YouTube ⋅ Twitter ⋅ Facebook ⋅ Instagram ⋅ Python Tutorials ⋅ Search ⋅ Privacy Policy ⋅ Energy Policy ⋅ Advertise ⋅ Contact
❤️ Happy Pythoning!

    <style>.clearly-enabled { overflow: hidden; } #clearly-container { position: fixed; top: 0; right: 0; width: 100%; height: 100%; border: none; overflow: auto; display: none !important; z-index: 2147483647; } .clearly-enabled #clearly-container { display: block !important; }</style><a id="fatkun-drop-panel-close-btn">×</a>拖拽到此处图片将完成下载[

Improve Your Python

](#)

Improve Your Python

...with a fresh 🐍 Python Trick 💌
code snippet every couple of days:

Almost there...

Check your inbox. I'm sending you the first Python Trick right now.

Get a Sample Chapter From the First Course

Enter your email address below and we'll send you the sample chapter right away:

Almost there...

We emailed you the sample chapter. Please check your inbox in a few minutes.

Get the Python Cheat Sheet

Enter your email address below and we'll send you the Python cheat sheet right away:

Almost there...

We emailed you the cheat sheet. Please check your inbox in a few minutes.

参考资料

[1]

Permanent link: #before-you-begin

[2]

Permanent link: #practical-use-cases

[3]

Permanent link: #importing-new-products-into-a-database

[4]

Permanent link: #exporting-database-data-into-a-spreadsheet

[5]

Permanent link: #appending-information-to-an-existing-spreadsheet

[6]

Permanent link: #learning-some-basic-excel-terminology

[7]

Permanent link: #getting-started-with-openpyxl

[8]

Permanent link: #reading-excel-spreadsheets-with-openpyxl

[9]

Permanent link: #dataset-for-this-tutorial

[10]

Permanent link: #a-simple-approach-to-reading-an-excel-spreadsheet

[11]

Permanent link: #additional-reading-options

[12]

Permanent link: #importing-data-from-a-spreadsheet

[13]

Permanent link: #iterating-through-the-data

[14]

Permanent link: #manipulate-data-using-pythons-default-data-structures

[15]

Permanent link: #convert-data-into-python-classes

[16]

Permanent link: #appending-new-data

[17]

Permanent link: #writing-excel-spreadsheets-with-openpyxl

[18]

Permanent link: #creating-a-simple-spreadsheet

[19]

Permanent link: #basic-spreadsheet-operations

[20]

Permanent link: #adding-and-updating-cell-values

[21]

Permanent link: #managing-rows-and-columns

[22]

Permanent link: #managing-sheets

[23]

Permanent link: #freezing-rows-and-columns

[24]

Permanent link: #adding-filters

[25]

Permanent link: #adding-formulas

[26]

Permanent link: #adding-styles

[27]

Permanent link: #conditional-formatting

[28]

Permanent link: #adding-images

[29]

Permanent link: #adding-pretty-charts

[30]

Permanent link: #convert-python-classes-to-excel-spreadsheet

[31]

Permanent link: #bonus-working-with-pandas

[32]

Permanent link: #conclusion