Chapter 1. MySQL Architecture
A NOTE FOR EARLY RELEASE READERS
With Early Release ebooks, you get books in their earliest form—the author’s raw
and unedited content as they write—so you can take advantage of these
technologies long before the official release of these titles.
This will be Chapter 1 of the final book. Please note that the GitHub repo will be
made active later on.
If you have comments about how we might improve the content and/or examples in
this book, or if you notice missing material within this chapter, please reach out to the
editor at vwilson@oreilly.com.
MySQL’s architectural characteristics make it useful for a wide
range of purposes. Although it is not perfect, it is flexible enough
to work well in both small and large environments. These range
from a personal website up to large scale enterprise applications.
To get the most from MySQL, you need to understand its design
so that you can work with it, not against it.
This chapter provides a high-level overview of the MySQL server
architecture, the major differences between the storage engines,
and why those differences are important. We’ve tried to explain
MySQL by simplifying the details and showing examples. This
discussion will be useful for those new to database servers as
well as readers who are experts with other database servers.
MySQL’s Logical Architecture
A good mental picture of how MySQL’s components work
together will help you understand the server. Figure 1-1 shows a
logical view of MySQL’s architecture.
The topmost layer contains the services that aren’t unique to
MySQL. They’re services most network-based client/server tools
or servers need: connection handling, authentication, security,
and so forth.
Figure 1-1. A logical view of the MySQL server architecture
The second layer is where things get interesting. Much of
MySQL’s brains are here, including the code for query parsing,
analysis, optimization, caching, and all the built-in functions
(e.g., dates, times, math, and encryption). Any functionality
provided across storage engines lives at this level: stored
procedures, triggers, and views, for example.
The third layer contains the storage engines. They are
responsible for storing and retrieving all data stored “in” MySQL.
Like the various file systems available for GNU/Linux, each
storage engine has its own benefits and drawbacks. The server
communicates with them through the storage engine API. This
interface hides differences between storage engines and makes
them largely transparent at the query layer. The API contains a
couple of dozen low-level functions that perform operations such
as “begin a transaction” or “fetch the row that has this primary
key.” The storage engines don’t parse SQL or communicate with
each other; they simply respond to requests from the server.
Connection Management and Security
By default, each client connection gets its own thread within the
server process. The connection’s queries execute within that
single thread, which in turn resides on one core or CPU. The
server caches threads, so they don’t need to be created and
destroyed for each new connection.
When clients (applications) connect to the MySQL server, the
server needs to authenticate them. Authentication is based on
username, originating host, and password. X.509 certificates can
1
2
also be used across a TLS (Transport Layer Security) connection.
Once a client has connected, the server verifies whether the
client has privileges for each query it issues (e.g., whether the
client is allowed to issue a SELECT statement that accesses the
Country table in the world database).
Optimization and Execution
MySQL parses queries to create an internal structure (the parse
tree), and then applies a variety of optimizations. These can
include rewriting the query, determining the order in which it
will read tables, choosing which indexes to use, and so on. You
can pass hints to the optimizer through special keywords in the
query, affecting its decision-making process. You can also ask
the server to explain various aspects of optimization. This lets
you know what decisions the server is making and gives you a
reference point for reworking queries, schemas, and settings to
make everything run as efficiently as possible.
The optimizer does not really care what storage engine a
particular table uses, but the storage engine does affect how the
server optimizes the query. The optimizer asks the storage
engine about some of its capabilities and the cost of certain
operations, and for statistics on the table data. For instance,
some storage engines support index types that can be helpful to
certain queries. You can read more about indexing and schema
optimization in Chapter 8.
In older versions, MySQL made use of an internal query cache to
see if it can serve the results from there. However, as
concurrency increased, the query cache became a notorious
bottleneck. As of MySQL 5.7.20 the query cache was officially
deprecated as a MySQL feature and in the 8.0 release, the query
cache is fully removed. Even though the query cache is no longer
a core part of the MySQL server, caching frequently served result
sets is a good practice. In Chapter 5 we will go over ways you can
implement such a cache using different technologies.
Concurrency Control
Anytime more than one query needs to change data at the same
time, the problem of concurrency control arises. For our
purposes in this chapter, MySQL has to do this at two levels: the
server level and the storage engine level. Concurrency control is
a big topic to which a large body of theoretical literature is
devoted, so we will just give you a simplified overview of how
MySQL deals with concurrent readers and writers, so you have
the context you need for the rest of this chapter.
To illustrate how MySQL handles concurrent work on the same
set of data, we will use a traditional spreadsheet file as an
example. A spreadsheet consists of rows and columns, much like
a database table. Assume the file is on your laptop and only you
have access to it. There are no potential conflicts, only you can
make changes to the file. Now, imagine you need to collaborate
with a coworker on that spreadsheet. It is now on a shared server
that both of you have access to. What happens when both of you
need to make changes to this file at the same time? What if we
Category | exam bundles |
Comments | 0 |
Rating | |
Sales | 0 |