|
Scaling Unix to infinity with SQLPLEASE NOTE – This document is still very much a work in progress. I posted it to my Web site because I feel like the information it contains is valuable and not found anywhere else. But it is currently unfinished, and I'm unfortunately unable to dedicate any resources to finishing it up at the moment. Sorry! You're welcome to ask specific questions, I'll try to answer them to the best of my knowledge. IntroductionUnix is well known for its text-based approach to system administration. While in the recent years efforts have been made to develop GUI frontends for most system tasks, (with mixed success as any Unix system admin will tell), at the core Unix still heavily relies on text configuration files. Such an approach has obvious advantages when compared to other OSes with GUI-based administration only, such as Windows NT and 2000. For example, text configuration makes it extremely easy to document and duplicate a system's setup; such files can be processed by external tools, and edited even when the system is running in a minimal configuration, or under duress, at times of repair. On the other hand text-based configuration does not scale very well. Growing configuration data will usually result in huge configuration files, hard to parse and understand by a human operator. Big configuration files can often be split in multiple, smaller files, but at the expense of readibility. A more structured way is needed to store configuration data in order to scale services efficiently to potentially millions of users. This article discusses a method increasingly deployed by Internet Service Providers and IT organizations alike: structuring Unix systems around a SQL database. Using a SQL database to store configuration data solves many of the problems mentionnned earlier. With a SQL database, the data:
This article uses FreeBSD and MySQL as example system and database, but it should apply equally well to other combinations of operating system and database server. 1. Web Service With Apache1.1. Virtual Hosts ConfigurationApache is by all accounts a great piece of software: it complies to all standards, is fast, very stable and its open-source license lets you modify and distribute it at will. Unfortunately its configuration file has grown from being a very simple text file to a complex hierarchical model allowing the configuration of multiple virtual hosts. In the process was lost a lot of the manageability that a simple text file offers. In this article I would like to introduce you to a new way of configuring Apache, directly from a SQL database, with the help of mod_perl: instead of creating and maintaining by hand a single text file with (potentially) hundreds of virtual hosts definitions, you will be able to store your all of your configuration in a database table.
Quick reminder: mod_perl, initially developed by Doug MacEachern,
embeds a Perl interpreter in the Apache server. It is used mostly to
speed up CGI execution (with the
Example 1. Comparing Apache configuration methods.
Example 2 is slightly more advanced and unveils the true power of Perl sections, configuring the Apache server as the user who started it. All Apache directives are accessible from within Perl sections as Perl variables.
<Perl>
#!perl
$User = $>; # current UID
$Group = $); # current GID
$Port = 8888;
$ServerRoot = "$ENV{HOME}/www";
$ServerName = `hostname`;
__END__
</Perl>
Example 2. A more advanced example of Apache configuration with a
Perl section: run the server as the current user.
Even virtual host configuration can be done with Perl sections: the
magic Since VirtualHost directives merely override values inherited from the main configuration, every <VirtualHost> section only needs to change whatever is needed for this specific virtual host. This is the main idea behind mod_vhost_alias, where only a limited number of changes can be made per virtual host. On the other hand, using mod_perl and a database to configure your Apache means that you are only limited by your imagination. See example 3 for the database table used to store virtual hosts. Because we will use the DBI module to access the database, almost any database can be used. I'm using and showing MySQL in the examples because of its ubiquity. The "rank" column is used to order the VirtualHost definitions. Apache may depend on their order for virtual host matching, see An In-Depth Discussion of Virtual Host Matching for more details.
Example 3. The database table storing virtual hosts configuration.
Notice how in example 2
we delimited the actual Perl code between
$ perl -cx apache.confIt's a useful trick. [program] here Note how we make sure a new value is present to override the value inherited from the main server configuration; otherwise we could resetting or erasing a configuration value.
The
Once the configuration is complete, we insert it into the special
%VirtualHost = (
"10.0.0.1:80" => [ { ServerName => "www1.example.com",
DocumentRoot => "/www/example1/htdocs",
...
},
{ ServerName => "www2.example.com",
DocumentRoot => "/www/example2/htdocs",
...
},
],
"10.0.0.2:81" => ...
);
Example 4. The
%VirtualHost variable.ExtensionsI've shown only a limited subset of what can be done with this technique. Since almost any ... Why would you want to do thisA new module mod_vhost_alias distributed with Apache 1.3 makes the task much easier and almost infinitely scalable, allowing you to have potentially thousands of virtual hosts running off of one machine. Although mod_vhost_alias is a great addition to Apache, I find it to be of limited flexibility Installing mod_perl just for the just the ease of configuring your Apache from a database might not be worth the extra XXX to your site. But, if you are already using mod_perl for all its other benefits (eg. CGI performance, SQL logging, etc.), this approach suddenly makes a lot more sense. Also, any site beyond a certain size is already running some kind of database. For example, I believe it to be perfect for an Web-hosting environment where lots of sites are used but require more configuration that what can be done with mod_vhost_alias. 1.2. Web Logs
Have Apache log directly to the database with
the PerlLogHandler Apache::DBILogger PerlSetVar DBILogger_data_source DBI:driver:dsn PerlSetVar DBILogger_username httpd_user PerlSetVar DBILogger_password secret PerlSetvar DBILogger_table requests Example 6. Apache::DBILogger configuration.
Crafting your own analysis of the log files is only limited by the power of your SQL engine. Note that in practice you will probably want to use specialized software for the task of Web log analysis. Thanks to logging to a database, the software will not have to go through the lengthy process of importing the log files into its format.
SQL table for logging.
1.3. Authentication & Authorization
Web passwords can also be stored in the database, instead of
using the text-based
htpasswd and
htgroup files.
The mod_perl modules Why two different modules for something that was traditionnally bundled together in the Apache configuration? Because each of these two modules actually answer a different need:
Sample
AuthName Private Web Area AuthType Basic PerlAuthenHandler Apache::AuthenDBI PerlAuthzHandler Apache::AuthzDBI PerlSetVar Auth_DBI_data_source dbi:driver:dsn PerlSetVar Auth_DBI_username db_username PerlSetVar Auth_DBI_password db_password # DBI->connect($data_source, $username, $password) PerlSetVar Auth_DBI_pwd_table users PerlSetVar Auth_DBI_uid_field username PerlSetVar Auth_DBI_pwd_field password # SELECT pwd_field FROM pwd_table WHERE uid_field=$user # for authentication PerlSetVar Auth_DBI_grp_field groupname # SELECT grp_field FROM pwd_table WHERE uid_field=$user # for authorization require valid-user # for authentication only require user user_1 user_2 ... # for authorization require group group_1 group_2 ... Example X. Sample
.htaccess file for authentication
& authorization.AuthDBI commands.
Example X. Extra configuration for authentication & authorization.
2. E-mail Service With PostfixOne piece of advice to the reader: trash sendmail; it is old technology. Postfix on the other hand comes secure out of the box, is more flexible, configurable by a normal human being, and supports external lookup tables, called maps. The default types of maps supported by default in Postfix are text-based and hash-based, but Scott Cotton and Joshua Marcus of IC Group, Inc., added a new mysql map type, allowing the storing of e-mail aliases, forwarding addresses, etc. in a database. Postfix installations which may benefit from using mysql map types include sites that have a need for instantaneous updates of forwarding, and sites that may benefit from having mail exchangers reference a networked database, possibly working in conjunction with a customer database of sorts. # # postfix config file for looking up aliases in a MySQL database # # the user name and password to log into the mysql server user = some_user password = some_password # the database name on the servers dbname = customer_database # the table name table = mxaliases # the SQL query select_field = forw_addr where_field = alias # # you may specify additional_conditions here additional_conditions = and status = 'paid' # # the above variables will result in the following query: # SELECT forw_addr FROM mxaliases WHERE alias = '$lookup' AND status = 'paid' # the hosts that postfix will try to connect to # and query from (in the order listed) hosts = db1.example.com db2.example.com Example X. Postfix configuration for mysql map.
Note how this configuration interface setup allows for multiple databases: you can use one for a virtual table, one for an access table, and one for an aliases table if you want. Since sites that have a need for multiple mail exchangers may enjoy the convenience of using a networked mailer database, but do not want to introduce a single point of failure to their system, we've included the ability to have postfix reference multiple hosts for access to a single mysql map. This will work if sites set up mirrored mysql databases on two or more hosts. Whenever queries fail with an error at one host, the rest of the hosts will be tried in order. Each host that is in an error state will undergo a reconnection attempt every so often, and if no mysql server hosts are reachable, then mail will be deferred until at least one of those hosts is reachable. 3. Domain Name Service With Bindbind 8 versus bind 9 B. References
C. About the authorRenaud Waldura is a freelance software consultant specializing in Internet applications. Visit Renaud's Web site at http://renaud.waldura.com/ to learn more about how he can help you with your business. Copyright © 2000-2007 by Renaud Waldura. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee, provided that copies are not made or distributed for profit or commercial advantage, and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than Renaud Waldura must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from renaud@waldura.com. Last modified: 2002/08/01 07:04:03 $ |