接上文 Python Tricks: A Shocking Truth About String Formatting(一)https://developer.aliyun.com/article/1618445
#3 - Literal String Interpolation(Python3.6+)
Python3.6 adds yet another way to format strings, called Formatted String Literals. This new way of formatting strings lets you use embedded Python expressions inside string constants. Here’s a simple example to give you a feel for the feature:
>>> f'Hello, {name}'
'Hello, Bob'
This new formatting syntax is powerful. Because you can embed arbitrary Python expressions, you can even do inline arithmetic with it, like this:
>>> a = 5
>>> b = 10
>>> f'Five plus ten is {a + b} and not {2 * (a + b)}.'
'Five plus ten is 15 and not 30.'
Behind the scenes, formatted string literals are a Python parser feature that converts f-strings into a series of string constants and expressions. They then get joined up to build the final string.
Imagine we had the following greet() function that contains an f-string:
>>> def greet(name, question):
... return f"Hello, {name}! How's it {question}?"
...
>>> greet('Bob', 'going')
"Hello, Bob! How's it going?"
When we disassemble the function and inspect what’s going on behind the scenes, we can see that the f-string in the function gets transformed into something similar to the following:
>>> def greet(name, question):
... return("Hello, " + name + "! How's it " + question +"?")
The real implementation is slightly faster than that because it uses the BUILD_STRING opcode as an optimization. But functionally they’re the same:
>>> import dis
>>> dis.dis(greet)
2 0 LOAD_CONST 1 ('Hello, ')
2 LOAD_FAST 0 (name)
4 BINARY_ADD
6 LOAD_CONST 2 ("! How's it ")
8 BINARY_ADD
10 LOAD_FAST 1 (question)
12 BINARY_ADD
14 LOAD_CONST 3 ('?')
16 BINARY_ADD
18 RETURN_VALUE
String literals also support the existing format string syntax of the str.format() method. That allows you to solve the same formatting problems we’ve discussed in the previous two sections:
>>> f"Hey {name}, there's a {errno:#x} error!"
"Hey Bob, there's a 0xbadc0ffee error!"
Python’s new Formatted String Literals are similar to the JavaScript Template Literals added in ES2015. I think they’re quite a nice addition to the language, and I’ve already started using them in my day-to-day Python3 work. You can learn more about Formatted String Literals in the offical Python documentation.
#4 - Template Strings
One more technique for string formatting in Python is Template Strings. It’s a simpler and less powerful mechanism, but in some cases this might be exactly what you’re looking for.
Let’s take a look at a simple greeting example:
>>> from string import Template
>>> t = Template('Hey, $name!')
>>> t.substitute(name=name)
'Hey, Bob!'
You see here that we need to import the Template class from Python’s built-in string module. Template strings are not a core language feature but they’re supplied by a module in the standard library.
Another difference is that template strings don’t allow format specifiers. So in order to get our error string example to work, we need to transform our int error number into a hex-string ourselves:
>>> templ_string = 'Hey $name, there is a $error error!'
>>> Template(templ_string).substitute(name=name, error=hex(errno))
'Hey Bob, there is a 0xbadc0ffee error!'
That worked great but you’re probably wondering when you use template strings in your Python programs. In my opinion, the best use case for template strings is when you’re handling format strings generated by users of your program. Due to their reduced complexity, template strings are a safer choice.
The more complex formatting mini-languages of other string formatting techniques might introduce security vulnerabilities to your programs. For example, it’s possible for format strings to access arbitrary variables in your program.
That means, if a malicious user can supply a format string they can also potentially leak secret keys and other sensitive information!Here’s a simple proof of concept of how this attack might be used:
>>> SECRET = 'this-is-a-secret'
>>> class Error:
... def __init__(self):
... pass
>>> err = Error()
>>> user_input = '{error.__init__.__globals__[SECRET]}'
# Uh-oh...
>>> user_input.format(error=err)
'this-is-a-secret'
See how the hypothetical attacker was able to extract our secret string by accessing the globals dictionary from the format string? Scary, huh! Template Strings close this attack vector, and this makes them a safer choice if you’re handling format strings generated from user input:
>>> user_input = '${error.__init__.__globals__[SECRET]}'
>>> Template(user_input).substitute(error=err)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/liuxiaowei/opt/anaconda3/lib/python3.9/string.py", line 121, in substitute
return self.pattern.sub(convert, self.template)
File "/Users/liuxiaowei/opt/anaconda3/lib/python3.9/string.py", line 118, in convert
self._invalid(mo)
File "/Users/liuxiaowei/opt/anaconda3/lib/python3.9/string.py", line 101, in _invalid
raise ValueError('Invalid placeholder in string: line %d, col %d' %
ValueError: Invalid placeholder in string: line 1, col 1
Which String Formatting Method Should I Use?
I totally get that having so much choice for how to format your strings in Python can feel very confusing. This would be a good time to bust out some flowchart infographic…
But I’m not going to do that. Instead, I’ll try to boil it down to the simple rule of thumb that I apply when I’m writing Python.
Here we go—you can use this rule of thum any time you’re having difficulty deciding which string formatting method to use, depending on the circumstances:
Dan’s Python String Formatting Rule of Thumb:
if your format strings are user-supplied, use Template Strings to avoid security issues. Otherwise, use Literal String Interpolation if you’re on Python3.6+, and “New Style” String Formatting if you’re not.