Showing posts with label 计算机技术. Show all posts

Monday, June 25, 2007

[zz] 编译的一点体会，希望对新手有点用 :)

如何从源码包安装软件？

从源码包安装软件最重要的就是仔细阅读README INSTALL等说明文件

它会告诉你怎样才能成功安装

通常从源码包安装软件的步骤是：
tar jxvf gtk+-2.4.13.tar.bz2 解开源码包
cd gtk+-2.4.13/ 进入源码目录
./configure 似乎在某些环境下./configure会造成终端退出
而使用. configure则会正常运行,如果有这个现象，就试试 . configure

通过configure程序猜测主机信息，最终建立Makefile,以完成make，所以如果./configure不成功
而去make的话,就会出现"make: *** No targets specified and no makefile found. Stop."
make 当./configure成功结束后，就开始正式编译程序了.
make install 编译成功后使用make install安装
make uninstall 某些软件支持卸载，可能使用该方法卸载，如果支持的话，通常会在README中写到(似乎比较少)

configure程序带有很多参数，可以通过 ./configure --help 查看详细内容,通常位于前面的是常规configure的
参数说明，末尾是该程序的可用参数说明。
./configure --prefix=/usr 指定安装目录，通常从源码包编译安装的软件默认会放在/usr/local下
因为这是FHS(Filesystem Hierarchy Standard)的规定，不知道什么是FHS？看看这篇文章吧：
http://www.pathname.com/fhs/pub/fhs-2.3.html 相信它会让你对linux系统结构有更好的理解，很值得读读。

再说一下几个关系到能否成功编译的东东：/etc/ld.so.conf ldconfig PKG_CONFIG_PATH

首先说下/etc/ld.so.conf:

这个文件记录了编译时使用的动态链接库的路径。
默认情况下，编译器只会使用/lib和/usr/lib这两个目录下的库文件
如果你安装了某些库，比如在安装gtk+-2.4.13时它会需要glib-2.0 >= 2.4.0,辛苦的安装好glib后
没有指定 --prefix=/usr 这样glib库就装到了/usr/local下，而又没有在/etc/ld.so.conf中添加/usr/local/lib
这个搜索路径，所以编译gtk+-2.4.13就会出错了
对于这种情况有两种方法解决：
一：在编译glib-2.4.x时，指定安装到/usr下，这样库文件就会放在/usr/lib中，gtk就不会找不到需要的库文件了
对于安装库文件来说，这是个好办法，这样也不用设置PKG_CONFIG_PATH了 (稍后说明)

二：将/usr/local/lib加入到/etc/ld.so.conf中，这样安装gtk时就会去搜索/usr/local/lib,同样可以找到需要的库
将/usr/local/lib加入到/etc/ld.so.conf也是必须的，这样以后安装东东到local下，就不会出现这样的问题了。
将自己可能存放库文件的路径都加入到/etc/ld.so.conf中是明智的选择 ^_^
添加方法也极其简单，将库文件的绝对路径直接写进去就OK了，一行一个。例如：
/usr/X11R6/lib
/usr/local/lib
/opt/lib

再来看看ldconfig是个什么东东吧：

它是一个程序，通常它位于/sbin下，是root用户使用的东东。具体作用及用法可以man ldconfig查到
简单的说，它的作用就是将/etc/ld.so.conf列出的路径下的库文件缓存到/etc/ld.so.cache 以供使用
因此当安装完一些库文件，(例如刚安装好glib)，或者修改ld.so.conf增加新的库路径后，需要运行一下/sbin/ldconfig
使所有的库文件都被缓存到ld.so.cache中，如果没做，即使库文件明明就在/usr/lib下的，也是不会被使用的，结果
编译过程中抱错，缺少xxx库，去查看发现明明就在那放着，搞的想大骂computer蠢猪一个。 ^_^
我曾经编译KDE时就犯过这个错误，(它需要每编译好一个东东，都要运行一遍)，所以

切记改动库文件后一定要运行一下ldconfig，在任何目录下运行都可以。

再来说说 PKG_CONFIG_PATH这个变量吧:

经常在论坛上看到有人问"为什么我已经安装了glib-2.4.x，但是编译gtk+-2.4.x 还是提示glib版本太低阿？
为什么我安装了glib-2.4.x，还是提示找不到阿？。。。。。。"都是这个变量搞的鬼。
先来看一个编译过程中出现的错误 (编译gtk+-2.4.13):

checking for pkg-config... /usr/bin/pkg-config
checking for glib-2.0 >= 2.4.0 atk >= 1.0.1 pango >= 1.4.0... Package glib-2.0 was not found in the pkg-config search path.
Perhaps you should add the directory containing `glib-2.0.pc'
to the PKG_CONFIG_PATH environment variable
No package 'glib-2.0' found

configure: error: Library requirements (glib-2.0 >= 2.4.0 atk >= 1.0.1 pango >= 1.4.0) not met; consider adjusting the PKG_CONFIG_PATH environment variable if your libraries are in a nonstandard prefix so pkg-config can find them.
[root@NEWLFS gtk+-2.4.13]#
很明显，上面这段说明，没有找到glib-2.4.x,并且提示应该将glib-2.0.pc加入到PKG_CONFIG_PATH下。
究竟这个pkg-config PKG_CONFIG_PATH glib-2.0.pc 是做什么的呢？ let me tell you ^_^
先说说它是哪冒出来的，当安装了pkgconfig-x.x.x这个包后，就多出了pkg-config，它就是需要PKG_CONFIG_PATH的东东
pkgconfig-x.x.x又是做什么的？来看一段说明：

代码:

The pkgconfig package contains tools for passing the include path and/or library paths to build tools during the make file execution. pkg-config is a function that returns meta information for the specified library. The default setting for PKG_CONFIG_PATH is /usr/lib/pkgconfig because of the prefix we use to install pkgconfig. You may add to PKG_CONFIG_PATH by exporting additional paths on your system where pkgconfig files are installed. Note that PKG_CONFIG_PATH is only needed when compiling packages, not during run-time.

我想看过这段说明后，你已经大概了解了它是做什么的吧。
其实pkg-config就是向configure程序提供系统信息的程序，比如软件的版本啦，库的版本啦，库的路径啦，等等
这些信息只是在编译其间使用。你可以 ls /usr/lib/pkgconfig 下，会看到许多的*.pc,用文本编辑器打开
会发现类似下面的信息：

prefix=/usr
exec_prefix=${prefix}
libdir=${exec_prefix}/lib
includedir=${prefix}/include

glib_genmarshal=glib-genmarshal
gobject_query=gobject-query
glib_mkenums=glib-mkenums

Name: GLib
Description: C Utility Library
Version: 2.4.7
Libs: -L${libdir} -lglib-2.0
Cflags: -I${includedir}/glib-2.0 -I${libdir}/glib-2.0/include

明白了吧，configure就是靠这些信息判断你的软件版本是否符合要求。并且得到这些东东所在的位置，要不去哪里找呀。
不用我说你也知道为什么会出现上面那些问题了吧。

解决的办法很简单，设定正确的PKG_CONFIG_PATH，假如将glib-2.x.x装到了/usr/local/下，那么glib-2.0.pc就会在
/usr/local/lib/pkgconfig下，将这个路径添加到PKG_CONFIG_PATH下就可以啦。并且确保configure找到的是正确的
glib-2.0.pc,就是将其他的lib/pkgconfig目录glib-2.0.pc干掉就是啦。(如果有的话 ^-^)
设定好后可以加入到~/.bashrc中，例如：
PKG_CONFIG_PATH=/opt/kde-3.3.0/lib/pkgconfig:/usr/lib/pkgconfig:/usr/local/pkgconfig:
/usr/X11R6/lib/pkgconfig
[root@NEWLFS ~]#echo $PKG_CONFIG_PATH
/opt/kde-3.3.0/lib/pkgconfig:/usr/lib/pkgconfig:/usr/local/pkgconfig:/usr/X11R6/lib/pkgconfig

从上面可以看出，安装库文件时，指定安装到/usr，是很有好处的，无论是/etc/ld.so.conf还是PKG_CONFIG_PATH
默认都会去搜索/usr/lib的，可以省下许多麻烦，不过从源码包管理上来说，都装在/usr下
管理是个问题，不如装在/usr/local下方便管理
其实只要设置好ld.so.conf，PKG_CONFIG_PATH路径后，就OK啦 ^_^

另外某些软件因为版本原因(比如emacs-21.3)，在gcc-3.4.x下编译无法成功,(make 出错)
使用低版本的gcc就可能编译通过。
可能是因为gcc-3.3.x和gcc-3.4.x变化很大的缘故吧。

暂时想到了这么多，先记下这些吧，如果你对源码包编译有了一点的了解，就不枉我打了这么半天字啦。 ^_^

另外./configure 通过，make 出错，遇到这样的问题比较难办，只能凭经验查找原因，比如某个头文件没有找到，
这时候要顺着出错的位置一行的一行往上找错，比如显示xxxx.h no such file or directory 说明缺少头文件
然后去google搜。
或者找到感觉有价值的错误信息，拿到google去搜，往往会找到解决的办法。还是开始的那句话，要仔细看README,INSTALL
程序如何安装，需要什么依赖文件，等等。

另外对于newbie来说，编译时，往往不知道是否成功编译通过，而编译没有通过就去make install
必然会出错，增加了解决问题的复杂性，可以通过下面方法检查是否编译成功：

一：编译完成后，输入echo $? 如果返回结果为0,则表示正常结束，否则就出错了
echo $? 表示检查上一条命令的退出状态，程序正常退出返回0,错误退出返回非0。
二：编译时，可以用&&连接命令， && 表示"当前一条命令正常结束，后面的命令才会执行"，就是"与"啦。
这个办法很好，即节省时间，又可防止出错。例：
./configure --prefix=/usr && make && make install

编译DOSBOX时出现"cdrom.h:20:23: SDL_sound.h: No such file or directory"

今天忽然想回味下经典DOS游戏，于是编译这个DOSBOX模拟器，README中说明需要SDL_SOUND
于是下载，安装，很顺利，没有指定安装路径，于是默认的安装到了/usr/local/
当编译DOSBOX make 时，出现如下错误：
if g++ -DHAVE_CONFIG_H -I. -I. -I../.. -I../../include -I/usr/include/SDL -D_REENTRANT -march=pentium4 -O3 -pipe -fomit-frame-pointer -MT dos_programs.o -MD -MP -MF ".deps/dos_programs.Tpo" -c -o dos_programs.o dos_programs.cpp; \
then mv -f ".deps/dos_programs.Tpo" ".deps/dos_programs.Po"; else rm -f ".deps/dos_programs.Tpo"; exit 1; fi
In file included from dos_programs.cpp:30:
cdrom.h:20:23: SDL_sound.h: No such file or directory <------错误的原因在这里
In file included from dos_programs.cpp:30:
cdrom.h:137: error: ISO C++ forbids declaration of `Sound_Sample' with no type
cdrom.h:137: error: expected `;' before '*' token
make[3]: *** [dos_programs.o] Error 1
make[3]: Leaving directory `/root/software/dosbox-0.63/src/dos'
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory `/root/software/dosbox-0.63/src'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/root/software/dosbox-0.63'
make: *** [all] Error 2
[root@NEWLFS dosbox-0.63]#
看来是因为cdrom.h没有找到SDL_sound.h这个头文件
所以出现了下面的错误，但是我明明已经安装好了SDL_sound阿？
经过查找，在/usr/local/include/SDL/下找到了SDL_sound.h
看来dosbox没有去搜寻/usr/local/include/SDL下的头文件，既然找到了原因，就容易解决啦

[root@NEWLFS dosbox-0.63]#ln -s /usr/local/include/SDL/SDL_sound.h /usr/include

做个链接到/usr/include下，这样DOSBOX就可以找到了，顺利编译成功，回味仙剑ing....^_^
曾经编译Xorg-6.8.1的时候，也出现找不到freetype.h的问题，原因也是如此。
编译安装软件时，经常遇到类似的情况，都是因为找不到需要的头文件而出现错误，也许是因为
没有安装相关的头文件，或者是安装了但没有找到，如上例。
找不到的情况：做个链接到/usr/include下，就可以了。
没安装的情况：去google找什么东东包括该头文件，安装上就应该可以了。
通常错误提示也都是"No such file or directory"，所以编译失败时要好好找找错误信息哦。
错误信息总是在Error上面不远的，耐心点 ^_^

不修改/etc/ld.so.conf使用非默认路径下的库文件-----LD_LIBRARY_PATH

环境变量LD_LIBRARY_PATH列出了查找共享库时除了默认路径之外的其他路径。
如果不想修改或无法修改(无root权限)/etc/ld.so.conf而使用其他路径下的库文件
就需要设置LD_LIBRARY_PATH了，例：export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/lib
这样就可以使用/opt/lib下的库文件啦。当然还是修改/etc/ld.so.conf方便。

先写到这吧，再有编译感想再增加 ^_^

Tuesday, May 22, 2007

笑话："你能想出多少种方法让机器正常关机?"

答案一


alias guanji1 shutdown
alias guanji2 shutdown
alias guanji3 shutdown
alias guanji4 shutdown
alias guanji5 shutdown

guanji1
guanji2
guanji3
guanji4
guanji5

答案二


1、shutdown
2、halt(haltsys)
3、init 0
4、kill -9 1 ----好象不行吧！
5、三次sync(书上写的，在我的系统上不好使：） ----我的Linux下用不了，不知道Unix系统行不？

1、shutdown -h now
2、halt(haltsys)
3、init 0
4、poweroff
5.支持ACPI可以直接按Power开关安全关机
6.ctrl+alt+del  重启到ＢＩＯＳ检测时关机
7.图形界面下可以选择“关机"

让console与X剪贴板交互——xclip

一直在为如何快速把stdio重定向到X的剪贴板感觉到困惑。很多时候不得已只能使用鼠标拖动高亮后使用中键进行选择（要是鼠标没中键怎么办？！）
今天终于给我知道了有xclip这么个utility...(不要BS我阿>_<）


$ uname -a | xclip #把uname -a的输出pipe到X的primary buffer,即XA_PRIMARY

具体可以查看man xclip。作者写了非常详细易懂的介绍，里面有一句话我很欣赏～

I hate man pages without examples!

呵呵，有example果然能很快上手。

我使用了


alias xcp="xclip -selection clip "

来定义了一个xcp命令，作用与xclip一样，不过默认使用XA_CLIPBOARD，这样就可以方便地使用C-v来进行粘贴了。这在贴程序的时候特别方便，比如


[iveney@localhost excersize]$ xcp combine.java

然后就可以……xixi

Friday, May 18, 2007

[zz]Common Gateway Interface: Introduction

Common Gateway Interface

Overview

The Common Gateway Interface (CGI) is a standard for interfacing external applications with information servers, such as HTTP or Web servers. A plain HTML document that the Web daemon retrieves is static, which means it exists in a constant state: a text file that doesn't change. A CGI program, on the other hand, is executed in real-time, so that it can output dynamic information.

For example, let's say that you wanted to "hook up" your Unix database to the World Wide Web, to allow people from all over the world to query it. Basically, you need to create a CGI program that the Web daemon will execute to transmit information to the database engine, and receive the results back again and display them to the client. This is an example of a gateway, and this is where CGI, currently version 1.1, got its origins.

The database example is a simple idea, but most of the time rather difficult to implement. There really is no limit as to what you can hook up to the Web. The only thing you need to remember is that whatever your CGI program does, it should not take too long to process. Otherwise, the user will just be staring at their browser waiting for something to happen.

Specifics

Since a CGI program is executable, it is basically the equivalent of letting the world run a program on your system, which isn't the safest thing to do. Therefore, there are some security precautions that need to be implemented when it comes to using CGI programs. Probably the one that will affect the typical Web user the most is the fact that CGI programs need to reside in a special directory, so that the Web server knows to execute the program rather than just display it to the browser. This directory is usually under direct control of the webmaster, prohibiting the average user from creating CGI programs. There are other ways to allow access to CGI scripts, but it is up to your webmaster to set these up for you. At this point, you may want to contact them about the feasibility of allowing CGI access.

If you have a version of the NCSA HTTPd server distribution, you will see a directory called /cgi-bin. This is the special directory mentioned above where all of your CGI programs currently reside. A CGI program can be written in any language that allows it to be executed on the system, such as:

C/C++
Fortran
PERL
TCL
Any Unix shell
Visual Basic
AppleScript

It just depends what you have available on your system. If you use a programming language like C or Fortran, you know that you must compile the program before it will run. If you look in the /cgi-src directory that came with the server distribution, you will find the source code for some of the CGI programs in the /cgi-bin directory. If, however, you use one of the scripting languages instead, such as PERL, TCL, or a Unix shell, the script itself only needs to reside in the /cgi-bin directory, since there is no associated source code. Many people prefer to write CGI scripts instead of programs, since they are easier to debug, modify, and maintain than a typical compiled program.

用awk处理从网页上copy出来的表格数据

复习awk。发觉一定要常用常总结，不然老是忘。

从教务处复制下来的表格数据如下：
算法设计与应用 07上 05级郭嵩山 3 选修
遗传优化方法 07上 05级张军 3 选修
JAVA程序设计 07上 05级张治国 2 选修需要同时选择该门课程的实验课
JAVA程序设计实验 07上 05级张治国 1 选修作为“JAVA程序设计”的实验课
组合数学与数论 07上 05级陈晓峰 2 选修
无线通信网络 07上 04级蔡国扬 3 选修
J2EE架构及其程序设计 07上 04级张治国 2 选修
网络系统结构 07上 04级尹冬生 2 选修
XML及应用 07上 04级叶小平 2 选修
工作流技术 07上 04级余阳 2 选修
计算机游戏与动画 07上 04级纪庆革 2 选修
并行与分布计算 07上 04级林小拉 3 选修
高级数据库系统技术 07上 04级汤庸 3 选修
多媒体技术 07上 05级李才伟 3 选修
信息安全技术 07上 04级王常吉 3 选修
函数式程序设计 07上 04级乔海燕 2 选修

--
保存为course
使用


awk -F" " '{print $1}' course

输出第一栏内容为：
算法设计与应用
遗传优化方法
JAVA程序设计
JAVA程序设计实验
组合数学与数论
无线通信网络
J2EE架构及其程序设计
网络系统结构
XML及应用
工作流技术
计算机游戏与动画
并行与分布计算
高级数据库系统技术
多媒体技术
信息安全技术
函数式程序设计

Thursday, May 17, 2007

[zz]CSS Hack 浏览器兼容一览表

原文链接
http://www.gracecode.com/download/CSS+Hack+%E6%B5%8F%E8%A7%88%E5%99%A8%E5%85%BC%E5%AE%B9%E4%B8%80%E8%A7%88%E8%A1%A8/

CSS Hack 是指我们为了兼容各浏览器，而使用的特别的css定义技巧。这是国外摘来的一张 CSS hack 列表，显示了各浏览器对css hack的支持程度，对我们制作兼容网页非常有帮助。

Friday, May 11, 2007

使用Subversion进行版本控制

摘要部分

隐藏部分

正则表达式的五个成功习惯

正则表达式难于书写、难于阅读、难于维护，经常错误匹配意料不到的文本或者错过了有效的文本，这些问题都是由正则表达式的表现和能力引起的。每个元字符（metacharacter）的能力和细微差别组合在一起，使得代码不借助于智力技巧就无法解释。
许多包含一定特性的工具使阅读和编写正则表达式变得容易了，但是它们又很不符合习惯。对于很多程序员来说，书写正则表达式就是一种魔法艺术。他们坚持自己所知道的特征并持有绝对乐观的态度。如果你愿意采用本文所探讨的五个习惯，你将可以让你设计的正则表达式经受的住反复试验。
本文将使用Perl、PHP和Python语言作为代码示例，但是本文的建议几乎适用于任何替换表达式（regex）的执行。

一、使用空格和注释
对于大部分程序员来说，在一个正则表达式环境里使用空格和缩进排列都不成问题，如果他们没有这么做一定会被同行甚至外行人士看笑话。几乎每个人都知道把代码挤在一行会难于阅读、书写和维护。对于正则表达式又有什么不同呢？
大部分替换表达式工具都具有扩展的空格特性，这允许程序员把他们的正则表达式扩展为多行，并在每一行结尾加上注释。为什么只有少部分程序员利用这个特性呢？Perl 6的正则表达式默认就是扩展空格的模式。不要再让语言替你默认扩展空格了，自己主动利用吧。
记住扩展空格的窍门之一就是让正则表达式引擎忽略扩展空格。这样如果你需要匹配空格，你就不得不明确说明。
在Perl语言里面，在正则表达式的结尾加上x，这样“m/foo|bar/”变为如下形式：
m/
foo
|
bar
/x
在PHP语言里面，在正则表达式的结尾加上x，这样“"/foo|bar/"”变为如下形式：
"/
foo
|
bar
/x"
在Python语言里面，传递模式修饰参数“re.VERBOSE”得到编译函数如下：
pattern = r'''
foo
|
bar
'''
regex = re.compile(pattern, re.VERBOSE)
处理更加复杂的正则表达式时，空格和注释就更能体现出其重要性。假设下面的正则表达式用于匹配美国的电话号码：
$?\d{3}$? ?\d{3}[-.]\d{4}
这个正则表达式匹配电话号码如“(314)555-4000”的形式，你认为这个正则表达式是否匹配“314-555-4000”或者“555- 4000”呢？答案是两种都不匹配。写上这么一行代码隐蔽了缺点和设计结果本身，电话区号是需要的，但是正则表达式在区号和前缀之间缺少一个分隔符号的说明。
把这一行代码分成几行并加上注释将把缺点暴露无疑，修改起来显然更容易一些。
在Perl语言里面应该是如下形式：
/
$? # 可选圆括号
\d{3} # 必须的电话区号
$? # 可选圆括号
[-\s.]? # 分隔符号可以是破折号、空格或者句点
\d{3} # 三位数前缀
[-.] # 另一个分隔符号
\d{4} # 四位数电话号码
/x
改写过的正则表达式现在在电话区号后有一个可选择的分隔符号，这样它应该是匹配“314-555-4000”的，然而电话区号还是必须的。另一个程序员如果需要把电话区号变为可选项则可以迅速看出它现在不是可选的，一个小小的改动就可以解决这个问题。

二、书写测试
一共有三个层次的测试，每一层为你的代码加上一层可靠性。首先，你需要认真想想你需要匹配什么代码以及你是否能够处理错误匹配。其次，你需要利用数据实例来测试正则表达式。最后，你需要正式通过一个测试小组的测试。
决定匹配什么其实就是在匹配错误结果和错过正确结果之间寻求一个平衡点。如果你的正则表达式过于严格，它将会错过一些正确匹配；如果它过于宽松，它将会产生一个错误匹配。一旦某个正则表达式发放到实际代码当中，你可能不会两者都注意到。考虑一下上面电话号码的例子，它将会匹配“800-555-4000 = -5355”。错误的匹配其实很难发现，所以提前规划做好测试是很重要的。
还是使用电话号码的例子，如果你在Web表单里面确认一个电话号码，你可能只要满足于任何格式的十位数字。但是，如果你想从大量文本里面分离电话号码，你可能需要很认证的排除不符合要求的错误匹配。
在考虑你想匹配的数据的时候，写下一些案例情况。针对案例情况写下一些代码来测试你的正则表达式。任何复杂的正则表达式都最好写个小程序测试一下，可以采用下面的具体形式。
在Perl语言里面：
#!/usr/bin/perl

my @tests = ( "314-555-4000",
"800-555-4400",
"(314)555-4000",
"314.555.4000",
"555-4000",
"aasdklfjklas",
"1234-123-12345"
);

foreach my $test (@tests) {
if ( $test =~ m/
$? # 可选圆括号
\d{3} # 必须的电话区号
$? # 可选圆括号
[-\s.]? # 分隔符号可以是破折号、空格或者句点
\d{3} # 三位数前缀
[-\s.] # 另一个分隔符号
\d{4} # 四位数电话号码
/x ) {
print "Matched on $test\n";
}
else {
print "Failed match on $test\n";
}
}

在PHP语言里面：
$tests = array( "314-555-4000",
"800-555-4400",
"(314)555-4000",
"314.555.4000",
"555-4000",
"aasdklfjklas",
"1234-123-12345"
);

$regex = "/
$? # 可选圆括号
\d{3} # 必须的电话区号
$? # 可选圆括号
[-\s.]? # 分隔符号可以是破折号、空格或者句点
\d{3} # 三位数前缀
[-\s.] # 另一个分隔符号
\d{4} # 四位数电话号码
/x";

foreach ($tests as $test) {
if (preg_match($regex, $test)) {
echo "Matched on $test
;";
}
else {
echo "Failed match on $test
;";
}
}
?>;

在Python语言里面：
import re

tests = ["314-555-4000",
"800-555-4400",
"(314)555-4000",
"314.555.4000",
"555-4000",
"aasdklfjklas",
"1234-123-12345"
]

pattern = r'''
$? # 可选圆括号
\d{3} # 必须的电话区号
$? # 可选圆括号
[-\s.]? # 分隔符号可以是破折号、空格或者句点
\d{3} # 三位数前缀
[-\s.] # 另一个分隔符号
\d{4} # 四位数电话号码
'''

regex = re.compile( pattern, re.VERBOSE )

for test in tests:
if regex.match(test):
print "Matched on", test, "\n"
else:
print "Failed match on", test, "\n"

运行测试代码将会发现另一个问题：它匹配“1234-123-12345”。
理论上，你需要整合整个程序所有的测试到一个测试小组里面。即使你现在还没有测试小组，你的正则表达式测试也会是一个小组的良好基础，现在正是开始创建的好机会。即使现在还不是创建的合适时间，你也应该在每次修改以后运行测试一下正则表达式。这里花费一小段时间将会减少你很多麻烦事。

三、为交替操作分组
交替操作符号（|）的优先级很低，这意味着它经常交替超过程序员所设计的那样。比如，从文本里面抽取Email地址的正则表达式可能如下：
^CC:|To:(.*)
上面的尝试是不正确的，但是这个bug往往不被注意。上面代码的意图是找到“CC:”或者“To:”开始的文本，然后在这一行的后面部分提取Email地址。
不幸的是，如果某一行中间出现“To:”，那么这个正则表达式将捕获不到任何以“CC:”开始的一行，而是抽取几个随机的文本。坦白的说，正则表达式匹配 “CC:”开始的一行，但是什么都捕获不到；或者匹配任何包含“To:”的一行，但是把这行的剩余文本都捕获了。通常情况下，这个正则表达式会捕获大量 Email地址，所有没有人会注意这个bug。
如果要符合实际意图，那么你应该加入括号说明清楚，正则表达式如下：
(^CC:)|(To:(.*))
如果真正意图是捕获以“CC:”或者“To:”开始的文本行的剩余部分，那么正确的正则表达式如下：
^(CC:|To:)(.*)
这是一个普遍的不完全匹配的bug，如果你养成为交替操作分组的习惯，你就会避免这个错误。

四、使用宽松数量词
很多程序员避免使用宽松数量词比如“*?”、“+?”和“??”，即使它们会使这个表达式易于书写和理解。
宽松数量词可以尽可能少的匹配文本，这样有助于完全匹配的成功。如果你写了“foo(.*?)bar”，那么数量词将在第一次遇到“bar”时就停止匹配，而不是在最后一次。如果你希望从“foo###bar+++bar”中捕获“###”，这一点就很重要。一个严格数量词将捕获“###bar++ +”。
假设你要从HTML文件里面捕获所有电话号码，你可能会使用我们上文讨论过的电话号码正则表达式的例子。但是，如果你知道所有电话号码都在一个表格的第一列里面，你可以使用宽松数量词写出更简单的正则表达式：
;;(.+?);
很多刚起步的程序员不使用宽松数量词来否定特定种类。他们能写出下面的代码：
;;([^>;]+);
这种情况下它可以正常运行，但是如果你想捕获的文本包含有你分隔的公共字符（这种情况下比如;），这将会带来很大麻烦。如果你使用了宽松数量词，你只要花上很少的时间组装字符种类就能产生新的正则表达式。
在你知道你要捕获文本的环境结构时，宽松数量词是具有很大价值的。

五、利用可用分界符
Perl 和PHP语言常常使用左斜线（/）来标志一个正则表达式的开头和结尾，Python语言使用一组引号来标志开头和结尾。如果在Perl和PHP中坚持使用左斜线，你将要避免表达式中的任何斜线；如果在Python中使用引号，你将要避免使用反斜线（\）。选择不同的分界符或引号可以允许你避免一半的正则表达式。这将使得表达式易于阅读，减少由于忘记避免符号而潜在的bug。
Perl和PHP语言允许使用任何非数字和空格字符作为分界符。如果你切换到一个新的分界符，在匹配URL或HTML标志（如“http://”或“
;”）时，你就可以避免漏掉左斜线了。
例如，“/http:\/\/(\S)*/”可以写为“#http://(\S)*#”。
通用分界符是“#”、“!”和“|”。如果你要使用方括号、尖括号或者花括号，只要保持前后配对出现就可以了。下面就是一些通用分界符的示例：
#…# !…! {…} s|…|…| (Perl only) s[…][…] (Perl only) s<…>;/…/ (Perl only)
在Python中，正则表达式首先会被当作一个字符串。如果你使用引号作为分界符，你将漏掉所有反斜线。但是你可以使用“r''”字符串避免这个问题。如果针对“re.VERBOSE”选项使用三个连续单引号，它将允许你包含换行。例如 regex = "(\\w+)(\\d+)"可以写出下面的形式：
regex = r'''
(\w+)
(\d+)
'''

小结：本文的建议主要着眼于正则表达式的可读性，在开发中养成这些习惯，你将会更加清晰的考虑设计和表达式的结构，这将有助于减少bug和代码的维护，如果你自己就是这个代码的维护者你将倍感轻松。

Monday, April 30, 2007

[zz]How to Write an Operating System

[zz]引导过程概述

注：以下资料节选自《LPI 证书 101 考试准备: Linux 安装与包管理》from IBM developerWorks。

引导过程概述

在我们研究 LILO 和 GRUB 之前，先看看 PC 是如何启动（即引导）的。BIOS（表示 Basic Input Output Service，基本输入输出服务）代码被存储在非易失内存中，比如 ROM、EEPROM 或闪存。当打开或者重新启动 PC 时，执行此代码。它常常执行一次开机自检（power-on self test，POST），对计算机进行检查。最后，它从引导驱动器上的主引导记录装载第一个扇区。

正如前面的小节 “分区” 所提到的，MBR 还包含分区表，所以 MBR 中的可执行代码量少于 512 字节，这非常少。注意，每个硬盘（甚至软盘）都在它的 MBR 中包含可执行代码，即使这些代码只够输出“Non-bootable disk in drive A:”这样的一条消息。BIOS 从第一个扇区装载的代码被称为第一阶段引导装载程序 或者阶段 1 引导装载程序。

MS DOS、PC DOS 和 Windows 操作系统使用的标准硬盘驱动器 MBR 会检查分区表，寻找引导驱动器上标为活动的主分区，从这个分区装载第一个扇区，并且将控制传递到装载的代码的开头。这个新的代码段也称为分区引导记录（partition boot record）。分区引导记录实际上是另一个第一阶段引导装载程序，但是它能够从这个分区装载一组代码块。这些新代码称为第二阶段引导装载程序。在 MS-DOS 和 PC-DOS 中，第二阶段引导装载程序直接装载操作系统的其余部分。这就是操作系统通过引导自举使自己启动的过程。

对于只有一个操作系统的系统，这种过程工作得很不错。如果有多个操作系统（比如 Windows 98、Windows XP 和三个不同的 Linux 发行版），那么应该怎样做？您可以使用某些程序（比如 DOS FDISK 程序）改变活动分区并且重新引导。但这太麻烦了。另外，一个硬盘只能有四个主分区，而且标准 MBR 只能引导一个主分区。但是我们假想的例子有五个操作系统，每个都需要一个分区。天啦！

解决方案就是使用某些特殊代码，让用户选择要引导的操作系统。这种代码包括：

Loadlin，一个 DOS 可执行程序，可以从正在运行的 DOS 系统调用它来引导 Linux 分区。在建立多重引导系统还很复杂而且有风险的年代里，这个程序曾经流行过。
OS/2 Boot Manager，一个安装在专用的小分区中的程序。这个分区被标为“活动的”并且标准 MBR 引导过程启动 Boot Manager，这个程序显示一个菜单，让用户选择要引导的操作系统。
智能引导装载程序，这种程序可以驻留在一个操作系统分区上，可以由活动分区的分区引导记录或者主引导记录调用。这些程序包括：
- BootMagic™，即 Norton PartitionMagic™ 的组成部分
- LILO，即 LInux LOader
- GRUB，即 GRand Unified Boot loader

很明显，如果能够将系统控制传递给一个代码量超过 512 字节的程序，就不难允许从逻辑分区进行引导，或者从不在引导驱动器上的分区进行引导。这些解决方案都提供了这些可能性，因为它们能够从任意分区装载引导记录，或者因为它们了解为了启动引导过程要装载哪些文件。

从现在开始，我们将主要关注 LILO 和 GRUB，因为在大多数 Linux 发行版中都有这两种引导装载程序。在发行版的安装过程中，可能会让您选择要安装哪一种。这两种引导装载程序都可以用于大多数现代硬盘。硬盘技术发展得很快，所以应该确保您选择的引导装载程序、Linux 发行版（或其他操作系统）以及系统的 BIOS 可以支持新硬盘。否则就可能导致数据丢失。

LILO 和 GRUB 中使用的第二阶段装载程序允许在几种操作系统或版本之中选择要装载哪一个。但是，LILO 和 GRUB 的显著差异在于，在修改系统（比如升级内核或做其他修改）之后，需要使用一个命令重新创建 LILO 引导设置，而 GRUB 能够通过一个可编辑的配置文本文件来完成设置更新。LILO 的历史比较长，GRUB 比较新。原来的 GRUB 现在成了 GRUB Legacy，GRUB 2 正在 Free Software Foundation 的赞助下进行开发。

硬盘，磁盘，存储。分区，主分区,拓展分区与逻辑分区。乱谈。

今天在看LPI教程时再次接触到了磁盘分区的概念，回想起自己以前也为这个ambiguous的东西伤透了脑筋，今天是时候自己写写总结了 :)
-------------------------------------------------------------------------------------

1.存储器的分类
哇靠……怎么从这里开始讲起了……不过我实在想输理、萃取以下以前组成原理所学过的东西……
当然，从本质上来分，就是RAM和ROM，或另一个基本相同的角度来说就是易失性与非易失性。RAM一般用于PC机上的内存，从以前的SRAM，DRAM，到SDRAM，再到今天的DDR－RAM，以及DDR2-RAM,已经有了质的飞跃。但是它还是非易失性的，只要关机了，数据就会丢失，这与物理特性相关。

OS的待机与睡眠：待机时切断硬盘，显示器等电源，但内存仍然供电，因此可以快速恢复；而睡眠则把所有当前信息(或说context)保存在硬盘里，然后关机。当再次开机时，OS检测到上一次关闭时处于睡眠状态从而把context从硬盘中读取出来,快速恢复上一次休眠时的状态。

而新近几年流行起来的“U盘”则属于一种非易失性的RAM，也称Flash-memory。
所谓的BIOS，即Basic Input Output System，则是用ROM做成的，一般固定于主板（Main board,Mother board）上，存储着用于引导计算机启动，完成一些routine的代码，程序，例程。BIOS这样东西可谓让人又爱又恨，爱在它对于计算机启动的不可缺少，恨在它功能的局限（我觉得这可能是由于IBM PC的架构所致……当时以为够用，谁能预料到PC这东西能发展如此迅速呢？）比如说里面限定了CHS(Cylinder,Head,Sector)大小等。总之，它与整套IBM PC的架构密不可分，磁盘驱动器，总线，CPU等都必须遵循某套必须符合BIOS的规范才能正常工作。（比如说磁盘驱动器的MBR）

而磁盘驱动器则是属于另外一种东西了……它使用磁介质——此处不表。

2.磁盘的MBR（Master Boot Record)
由于种种历史原因，磁盘的MBR位于CHS＝001。这里存储了一些关于整块磁盘的元信息(meta data)，最重要的，莫过于从018A-01B7的4个9字节的主分区表入口，从01BE-01FD有4个16字节的主分区表了！这也是为什么人们总说：一块硬盘只能有四个主分区的原因。

这是由PC机硬盘的这种通用分区结构所决定的，这种通用分区结构不管你的电脑中使用的是哪种操作系统：MSDOS、PCDOS、DRDOS、Windows95/98/ME、Windows NT/2000/XP、Linux、Unix、Novell等等，都不能改变这种分区结构。

除非，你的PC不是IBM PC，或说你的PC不是x86/x86-64架构的。

但是，只有四个主分区，显然是不够的。因此，后来又有了拓展分区的概念。

自从1985年12月微软公司发布MS－DOS3.20版以后，上述PC硬盘的分区结构增加了扩展分区的功能，可以将上述4个主分区的其中一个主分区定义为“扩展分区”，在该扩展分区中，又可以建立许多个逻辑分区，因此，扩展分区中就必须有扩展分区表。

要注意的是，拓展分区只是一个抽象的概念，即，实际上不存拓展分区这样东西。而只是组成拓展分区的所有逻辑分区，称为这个磁盘的拓展分区。事实上，拓展分区看作一个整体就是一个主分区。因此硬盘上常见的分区方法是，一个-三个主分区，剩下空间全部用来作为逻辑分区。
如下图所示：

操作系统必须装在主分区？听说Windows与FreeBSD是的……不过Linux绝对不是，因为我一直装在逻辑分区 :)

有三种分区类型：主分区、逻辑分区和扩展分区。分区表在硬盘的主引导记录（master boot record，MBR）上。MBR 是硬盘上的第一个扇区，所以分区表不能在其中占据太大空间。这将一个硬盘上的主分区数限制为 4 个。如果需要超过 4 个分区（这种情况很常见），那么主分区之一必须变成扩展分区。一个硬盘只能包含一个扩展分区。扩展分区只是逻辑分区的容器。MS DOS 和 PC DOS 原来使用这种分区方案，这种方案允许 DOS、Windows 或 Linux 系统使用 PC 硬盘。
Linux 将主分区或扩展分区编号为 1 到 4，dev hda 可以有四个主分区，即 /dev/hda1、/dev/hda2、/dev/hda3 和 /dev/hda4。也可以有一个主分区 /dev/hda1 和一个扩展分区 /dev/hda2。如果定义逻辑分区，它们的编号从 5 开始，所以 /dev/hda 上的第一个逻辑分区是 /dev/hda5，即使硬盘上没有主分区，只有一个扩展分区（/dev/hda1）。

在我的机器上执行fdisk -l的结果
[root@localhost ~]# fdisk -l

Disk /dev/hda: 120.0 GB, 120034123776 bytes
255 heads, 63 sectors/track, 14593 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/hda1 1 892 7164958+ 12 Compaq diagnostics
/dev/hda2 * 893 4156 26218080 c W95 FAT32 (LBA)
/dev/hda3 4157 9371 41889456 c W95 FAT32 (LBA)
/dev/hda4 9372 14593 41945715 f W95 Ext'd (LBA)
/dev/hda5 9372 11982 20972826 b W95 FAT32
/dev/hda6 11983 11995 104391 83 Linux
/dev/hda7 11996 12256 2096451 82 Linux swap / Solaris
/dev/hda8 12257 14593 18771921 83 Linux

可以看到，我的硬盘大小为120GB，共有120034123776个字节，255个“磁头”(head,其实指一个碟片的一个面),63个扇区，14593个柱面，每个扇区大小为标准的512字节。

hda1(主分区1）被我的笔记本制造商用来做成诊断盘了……不表。hda2（主分区2）则是用于启动的分区(?)，使用了LBA，可以看到hda3,hda4都是主分区。而hda4是一个拓展分区(可以计算得到，hda5-hda8的大小正好是：20972826+104391+2096451+18771921=41945589，为什么不一致？41945715-41945589=126，我猜想是用一些空间来保存拓展分区表信息的）

小历史：
IBM PC发展至今，数据交换经历过一段历史。基本上数据总线由现在已经淘汰的ISA，到广泛使用的PCI（貌似也开始没落了），到AGP，再到PCI-E。而磁盘所使用的数据总线叫IBM-AT(Advanced Technology)。后来改名为ATA(AT Attachment)，当串行总线出现后(S-ATA)又叫做(P-ATA,P stands for Parallel)。使用PATA的硬盘也叫做IDE硬盘（由于Western Digital公司某个规范所遗留的名称，Integrated Drive Electronics，我想这也是为什么SATA在机器上被认作SCSI硬盘的原因吧……)

关于主盘与从盘(Master / Slave)
相信很多人像我以前一样对主从盘有着错误的理解。其实主、从盘都不是主人……都是奴隶（slave)阿……
像上面提到的ATA cable,可以看到除了两端有一个40针的接口外，中间比较偏向某端的也有一个40针的接口。原来，这就是从盘的接口。

误解一：以为两条ATA cable接的两个硬盘有一个是主盘，有一个是从盘。其实，在同一条IDE线上连的两个硬盘才分主/从盘。这也是一个术语“跳线”所指的东西，如果两个硬盘接在两条cable上，根本不用跳线。跳线只是为了指定哪个是主盘，哪个是从盘。
误解二：主盘比从盘好。其实不然，这只不过是一个index的问题。两个连在一条ATA cable上的磁盘都受到同一个主板接口的限制，或说，都受到磁盘控制器的限制。它们其实并无主次之分。因此，无论是访问速度还是访问优先级，都是平等的。
误解三：两个盘会有共享介质冲突。由于已经使用了某些并行访问技术，使得可以错开始终周期进行磁盘读写，因此，一条cable上的两个盘完全独立！因此两个速度不同的磁盘不会出现所谓的瓶颈效应，也不会出现“同一时间只能读一个硬盘”的限制。

3.CHS,LBA,Large
N年前，还是由于历史原因，BIOS对CHS有限制，即限制了最大的C，H，S。（具体没必要知道），因此造成了BIOS可以访问到的最大磁盘容量是137GB（好象是）。但是君不见现在的磁盘动辄上百G……因此有必要对这套scheme进行改进。这就引入了所谓的LBA(Logical Block Adressing).LBA把磁盘的地址重新编号，比如CHS=001,002编号为1，2，而OS等访问则可以不需知道CHS地址，而是知道LBA地址后通过磁盘控制器这个中间层进行访问。这样则可以使用逻辑地址访问，而不需直接使用物理地址了。下面摘抄一段：

LBA(Logical Block Addressing)逻辑块寻址模式。管理的硬盘空间可达 8.4GB。在 LBA 模式下，设置的柱面、磁头、扇区等参数并不是实际硬盘的物理参数。在访问硬盘时，由 IDE 控制器把由柱面、磁头、扇区等参数确定的逻辑地址转换为实际硬盘的物理地址。在 LBA 模式下，可设置的最大磁头数为 255，其余参数与普通模式相同，由此可以计算出可访问的硬盘容量为：512x63x255x1025=8.4GB。不过现在新主板的 BIOS 对 INT13 进行了扩展，使得 LBA 能支持 100GB 以上的硬盘。
LARGE 大硬盘模式，在硬盘的柱面超过 1024 而又不为 LBA 支持时采用。LARGE 模式采用的方法是把柱面数除以 2，把磁头数乘以 2，其结果总容量不变。
在这三种硬盘模式中，现在 LBA 模式使用最多。

4.IRQ
在BIOS设置里，经常可以见到一些关于PCI设备的IRQ设置。不过我从99年接触PC以来好像还没有见过不是设置为Auto的。事实上，IRQ＝Interrupt ReQuest。指的就是中断请求。最初，每个设备都有自己的私有IRQ。比如说，IRQ5 通常用于声卡或者第二个并行端口（打印机）。如果两个都要使用，那么不得不去寻找一个能够配置的卡（通常是通过硬件跳线设置）来使用另一个 IRQ，比如 IRQ15。（以上IRQ值基于标准IRQ建议）。而现在（我认为是所有的）PCI设备都是共享IRQ的了。即用了所谓的PnP(Plug and Play)技术（还记得以前开机没有splash screen时的右上角的Energy Star和Plug & Play徽标吗？）,所以，当某个设备中断 CPU 时，会有一个中断处理程序检查它并判断那个中断是否为它所用的中断，如果不是，则将它传递给链中的下一个处理程序。

5.IO端口
既然说到了IRQ，就要说说IO port了。
当 CPU 需要与某个外围设备通信时，它要通过一个 IO 端口来进行通信（事实上，这个IO端口具体在什么地方，是指什么呢……我现在还不知道 :)不过我猜可能是指系统数据总线的某个地址吧？）。当 CPU 需要向外围设备发送数据或者控制信息时，它向某个端口写入数据。当设备为 CPU 准备好了数据或者状态，CPU 从某个端口去读取数据或状态。大部分设备都拥有不止一个与之相关联的端口，通常是 2 的若干次幂（指数较小），比如 8、16 或者 32。数据传输通常是每次一个或两个字节。设备不能共享端口，所以，如果有 ISA 卡，那么必须确保每个设备都有其分配到了自己的端口。以前，这需要通过设备卡上的开关或者跳线来实现。现在完全没接触过ISA卡……我只知道，所有PCI卡都有PnP，都是自动分配端口的.

在Linux中输入cat /proc/ioports可以查看设备占用的IO端口。

6.吹一吹Linux文件系统与Windows文件系统
网上关于这方面的文章已经好多了，我为什么还要写？
我也不知道，不过，UNIX文件系统给我的感觉是，越学习越能体会到其中之道。

先说说Windows。其实，Windows的文件系统，包括FAT16/32,NTFS，我现在看来都可以看作是一种与UNIX文件系统类似的树状结构——但是，它不止一棵树，而是可以有24可棵。每个逻辑的"盘"分配一个"盘符"——26个英文字母里的一个。但是由于A,B是默认分配给floppy的，所以只能从C-Z中选一个(不区分大小写)。而且由于C盘必须是主分区，因此只剩下D-Z作为逻辑分区了。每个盘就是一棵树，那么Windows文件系统就可以看作是一片森林了。
对于Win32,我一般采用如下的分区策略:

C盘是系统盘(System)。只分配安装好系统后剩余几G的空间。除了系统以及必要的驱动程序外，什么也不安装。
D盘是媒体盘(Media)。用来装图片，音乐，电影等多媒体资料。
E盘是游戏盘(Game)。用来专门放游戏。
F盘是工具盘(Tool)。用来安装工具。
G盘是资料盘(Document)。用来放文档，电子书，以及ghost镜像。

然后是UNIX文件系统。按照上面所介绍的逻辑分区，主分区的概念给物理硬盘分区后，它们依次可以挂载到以"/"为根的树中。根据FHS(File Hierachical Standard)的定义，/ 下面有/tmp ,/usr, /root, /boot, /home 等标准目录，而每个分好的区就可以分配给这些目录。这就是所谓的挂载。不仅本地磁盘可以挂在，连网络磁盘也可以挂载；CD－ROM，USB memory等都是可以挂载到这上面去的。

Linux 文件系统包含了分配在磁盘或其他块存储设备备上的一些文件（以目录的形式）。与许多其他系统一样，Linux 系统上的目录可以包含其他目录，这称为 子目录。Microsoft® Windows® 等系统根据不同的盘符（A:，C:，等等）来分隔文件系统，但是 Linux 文件系统只是一个树型结构，它以 / 目录作为根目录。

您可能会奇怪，既然文件系统只是一个大的树型结构，那么硬盘布局为什么还重要呢。实际情况是，每个块设备（比如硬盘驱动器分区、CD-ROM 或软盘）上都有一个文件系统。通过将不同设备上的文件系统 挂装（mounting） 在树型结构中的某一个叫做挂装点（mount point） 的点上，就创建了文件系统的单一树型视图。

通常，在挂装过程中，首先将某一硬盘分区上的文件系统挂装为 /。可以将其他硬盘分区挂装为 /boot、/tmp 或 /home。例如，可以将软盘驱动器上的文件系统挂装为 /mnt/floppy，将 CD-ROM 上的文件系统挂装为 /media/cdrom1。还可以使用连网的文件系统（比如 NFS），挂装来自其他系统的文件。还有其他文件挂装类型，但是您只要理解这个过程就行了。挂装过程实际上是挂装某一设备上的文件系统，但是常常说“挂装这个设备”，这就表示“挂装这个设备上的文件系统”。

现在，假设您刚挂装了根文件系统（/），并想在挂装点 /media/cdrom 上挂装一个 IDE CD-ROM，/dev/hdd。在挂装这个 CD-ROM 之前，这个挂装点必须存在。在挂装 CD-ROM 时，CD-ROM 上的文件和子目录就变成了 /media/cdrom 之内和之下的文件和子目录。/media/cdrom 中原来已经存在的任何文件或子目录就不再可见了，尽管它们仍然存在于包含挂装点 /media/cdrom 的块设备上。如果卸载 CD-ROM，那么原来的文件和子目录重新可见。在打算作为挂装点使用的目录中，应该不放置任何文件，这样可以避免这个问题。

表 3 显示了 Filesystem Hierarchy Standard 要求在 / 中存在的目录（关于 FHS 的更多细节，请参阅参考资料）。

*表 3. / 中的 FHS 目录*
目录	描述
bin	基本命令的二进制代码
boot	引导装载程序的静态文件
dev	设备文件
etc	主机特有的系统配置
lib	基本的共享库和内核模块
media	可移除介质的挂装点
mnt	临时挂装文件系统的挂装点
opt	附加的应用程序软件包
sbin	基本的系统二进制代码
srv	这个系统提供的服务的数据
tmp	临时文件
usr	次级层次结构
var	可变数据

以上内容摘自：IBM dw LPI 证书 101 考试准备: Linux 安装与包管理

Drive letter assignment

From Wikipedia, the free encyclopedia

Jump to: navigation, search

Drive letter assignment is the process of assigning drive letters to primary and logical partitions (drive volumes) in the root namespace; this usage is found in Microsoft operating systems. Unlike the concept of UNIX mount points, where the user can create directories of arbitrary name and content in a root namespace, drive letter assignment constrains the highest-level namespace to single letters. Drive letter assignment is thus a process of using letters to name the roots of the "forest" representing the file system; each volume holds an independent "tree" (or, for non-hierarchical file systems, an independent list of files).

[hide]

[edit] Origin

The concept of drive letters, as used today, probably owes its origins to IBM's VM family of operating systems, dating back to CP/CMS in 1967 (and its research predecessor CP-40). The concept evolved through several steps:

CP/CMS used drive letters to identify minidisks attached to a user session. A full file reference (pathname in today's parlance) consisted of a filename, a filetype, and a disk letter called a filemode. Minidisks could correspond to physical disk drives, but more typically referred to logical drives, which were mapped automatically onto shared devices by the operating system as sets of virtual cylinders of fixed-size blocks. This was vastly easier to use than other mainframe file reference mechanisms, e.g. JCL.

CP/CMS inspired numerous other operating systems, including the CP/M microcomputer operating system – which used a drive letter prefix (e.g. "A:") to specify a physical storage device. This usage was similar to the device prefixes used in the RSX-11 and VMS operating systems. Early versions of CP/M (and other microcomputer operating systems) implemented a flat file system on each disk drive, where a complete file reference consisted of a drive letter followed by a filename (eight characters) and a filetype (three characters): A:readme.txt. (This was the era of 8-inch floppy disks, where such small namespaces did not impose practical constraints.)

The drive letter syntax chosen for CP/M was also adopted by Microsoft for its ubiquitous microcomputer operating systems MS-DOS and, later, Microsoft Windows. Originally, drive letters always represented physical volumes, but support for logical volumes eventually appeared.

Note that the important capability of hierarchical directories within each drive letter was initially absent from these systems. This was a major feature of UNIX and other robust operating systems, where hard disk drives held thousands (rather than tens or hundreds) of files. Increasing microcomputer storage capacities led to their introduction, eventually followed by long filenames. In file systems lacking such naming mechanisms, drive letter assignment proved a useful, simple organizing principle.

[edit] JOIN and SUBST

Drive letters are not the only way of accessing different volumes. MS-DOS offers a JOIN command that allows access to an assigned volume through an arbitrary directory, similar to the Unix mount command. It also offers a SUBST command which allows the assignment of a drive letter to a directory. One or both of these commands are removed in later systems like OS/2 or Windows NT, but starting with Windows 2000 both are again supported: the SUBST command exists as before, while JOIN's functionality is subsumed in LINKD (part of the Windows Resource Kit). In Windows Vista, the new command MKLINK can be used for this purpose. Also Windows 2000 and later supports mount points, accessible from the Control Panel.

[edit] Operating systems that use drive letter assignment

CP/M
DOS
Microsoft Windows
OS/2
Atari ST
The operating systems of PlayStation and Xbox video game consoles
libnds

[edit] Order of assignment

Except for CP/M and early versions of MS-DOS, each of these operating systems assigns drive letters according to the following algorithm:

Assign the drive letter 'A' to the boot floppy, and 'B' to the secondary floppy.
Assign a drive letter, beginning with 'C' to the first active primary partition recognised upon the first physical hard disk.
Assign subsequent drive letters to the first primary partition upon each successive physical hard disk drive, if present within the system.
Assign subsequent drive letters to every recognised logical partition, beginning with the first hard drive and proceeding through successive physical hard disk drives, if present within the system.
Assign subsequent drive letters to any RAM Disk.
Assign subsequent drive letters to any additional floppy, CD/DVD drives.

MS-DOS versions 3 and earlier assign letters to all of the floppy drives before considering hard drives, so a system with four floppy drives would call the first hard drive 'E'.

The order can depend on whether a given disk is managed by a boot-time driver or by a dynamically loaded driver. For example, if the second or third hard disk is of SCSI type and on MS-DOS requires drivers loaded through the CONFIG.SYS file (e.g. the controller card does not offer on-board BIOS or using this BIOS is not practical), then the first SCSI primary partition will appear after all the IDE partitions on MS-DOS. Therefore MS-DOS and, for example, OS/2 could have different drive letters, as OS/2 loads the SCSI driver earlier. A solution was not to use primary partitions on such hard disks.

In Windows NT, Windows 2000, Windows XP and OS/2, the operating system uses the aforementioned algorithm to automatically assign letters to floppy disk drives, CD-ROM drives, DVD drives, the boot disk, and other recognised volumes that are not otherwise created by an administrator within the operating system. Volumes that are created within the operating system are manually specified, and some of the automatic drive letters can be changed. Unrecognised volumes are not assigned letters, and are usually left untouched by the operating system.

A common problem that occurs with the drive letter assignment is that the letter assigned to a network drive can interfere with the letter of a local volume (like a newly installed CD/DVD drive or a USB stick). For example, if the last local drive has the letter D: and we have assigned to a network drive the letter E:, then when we connect a USB mass storage device it will also be assigned the letter E: causing to lose connectivity with either the network share or the USB device. To overcome this problem we have to manually assign drive letters or to install a 3rd party software as the USB Drive Letter Manager.

An alternate condition that can cause problems on Windows XP is when there are network drives defined but in an error condition (as they would be on a laptop operating outside the network). Even when the unconnected network drive is not the next available drive letter, Windows XP may be unable to map a drive and this error may also prevent the mounting of the USB device.

[edit] Common assignments

Applying the algorithms discussed above on a fairly modern Windows based system typically results in the following drive letter assignments:

A: — Floppy drive (3.5-inch is the modern standard).
B: — Unused, reserved for floppy drive; historically also for a second floppy drive, usually 5.25-inch. Also used for RAM-drives, in case of live CDs.
C: — Hard Drive.
D: to Z: — Other disk partitions are labeled here. (Win98 update really likes to put any CD-ROM drive as D:\ even putting it above a Primary Partition IDE device^{[citation needed]})
D: to Z: — CD, DVD and shared drives begin lettering after the last used hard drive partition designation.
F: — First Network Drive if using Novell NetWare
Z: — First Network Drive if using Banyan VINES

The C: drive usually contains all of the operating system files required for operation of the computer. On many modern personal computers only one hard drive is included in the design so it is designated C:. On such a computer, all of a user's personal files are often stored in directories on this drive as well. Keep in mind, that these drives can, however, be different.

When there was not a second physical floppy drive, the B: drive was used as a virtual floppy drive marker for the A: drive, whereby the user would be prompted to switch floppies every time a read or write was required to whichever was not most recently used of A: or B:. This allowed for much of the functionality of two floppy drives on a computer that had only one (albeit usually resulting in lots of swapping).

Network drives are often assigned letters towards the end of the alphabet. This is often done to differentiate them from local drives. Local drives typically use letters towards the beginning of the alphabet, by using letters towards the end it reduces the risk of an assignment conflict. This is especially true when the assignment is done automatically across a network (usually by a logon script).

It is not possible to have more than 26 drives. If access to more filesystems than this is required volume mount points must be used.^[1]

[edit] Other implementations

[edit] The Amiga

The Commodore Amiga used a modified system whereby each drive was identified by a drive type and a drive number, starting from 0 (zero). For example, the first floppy drive on a system would be referred to as DFO:, where DF presumably stands for disk floppy (this, like the name of the computer, was probably inspired by Spanish: Amiga is the Spanish word for female friend, and the Spanish word for e.g. hard disk is disco duro, literally "disk hard"). Hard disks would be numbered starting at DH0:. Printers and CD-ROMs, however, broke this scheme, being numbered LPT1: (Line PrinTer 1) etc., and CD0: and up, respectively.

[edit] References

^ Microsoft TechNet Retrieved on 1 December 2006

ATA-Advanced Technology Attachment

Advanced Technology Attachment (ATA) is a standard interface for connecting storage devices such as hard disks and CD-ROM drives inside personal computers.

The standard is maintained by X3/INCITS committee T13. Many synonyms and near-synonyms for ATA exist, including abbreviations such as IDE and ATAPI. Also, with the market introduction of Serial ATA in 2003, the original ATA was retroactively renamed Parallel ATA (PATA).

In line with the original naming, this article covers only Parallel ATA.

Parallel ATA standards allow cable lengths up to only 18 inches (46 centimetres) although cables up to 36 inches (91 cm) can be readily purchased. Because of this length limit, the technology normally appears as an internal computer storage interface. It provides the most common and the least expensive interface for this application.

[hide]

[edit] History

ATA connection sockets on a PC motherboard located below RAM sockets

The name of the standard was originally conceived as PC/AT Attachment as its primary feature was a direct connection to the 16-bit ISA bus then known as 'AT bus'; the name was shortened to an inconclusive "AT Attachment" to avoid possible trademark issues.

An early version of the specification, conceived by Western Digital in late 1980s, was commonly known as Integrated Drive Electronics (IDE) due to the drive controller being contained on the drive itself as opposed to the then-common configuration of a separate controller connected to the computer's motherboard — thus making the interface on the motherboard a host adapter, though many people continue, by habit, to call it a controller.

Enhanced IDE (EIDE) — an extension to the original ATA standard again developed by Western Digital — allowed the support of drives having a storage capacity larger than 528 megabytes (504 mebibytes), up to 8.4 gigabytes. Although these new names originated in branding convention and not as an official standard, the terms IDE and EIDE often appear as if interchangeable with ATA. This may be attributed to the two technologies being introduced with the same consumable devices — these "new" ATA hard drives.

With the introduction of Serial ATA around 2003, conventional ATA was retroactively renamed to Parallel ATA (P-ATA), referring to the method in which data travels over wires in this interface.

The interface at first worked only with hard disks, but eventually an extended standard came to work with a variety of other devices — generally those using removable media. Principally, these devices include CD-ROM and DVD-ROM drives, tape drives, and large-capacity floppy drives such as the Zip drive and SuperDisk drive. The extension bears the name AT Attachment Packet Interface (ATAPI), which started as non-ANSI SFF-8020 standard developed by Western Digital and Oak Technologies, but then included in the full standard now known as ATA/ATAPI starting with version 4. Removable media devices other than CD and DVD drives are classified as ARMD (ATAPI Removable Media Device) and can appear as either a floppy or a hard drive to the operating system.

The move from programmed input/output (PIO) to direct memory access (DMA) provided another important transition in the history of ATA. As every computer word must be read by the CPU individually, PIO tends to be slow and use a lot of CPU resources. This is especially a problem on faster CPUs where accessing an address outside of the cacheable main memory (whether in the I/O map or the memory map) is a relatively expensive process. This meant that systems based around ATA devices generally performed disk-related activities much more slowly than computers using SCSI or other interfaces. However, DMA (and later Ultra DMA, or UDMA) greatly reduced the amount of processing time the CPU had to use in order to read and write the disks. This is possible because DMA and UDMA allow the disk controller to write data to memory directly, thus bypassing the CPU.

The original ATA specification used a 28-bit addressing mode. This allowed for the addressing of 2²⁸ (268,435,456) sectors (with blocks of 512 bytes each), resulting in a maximum capacity of 137 gigabytes (128 GiB). The standard PC BIOS system supported up to 7.88 GiB (8.46 GB), with a maximum of 1024 cylinders, 256 heads and 63 sectors. When the lowest common denominators of the CHS limitations in the standard PC BIOS system and the IDE standard were combined, the system as a whole was left limited to a mere 504 megabytes. BIOS translation and LBA were introduced, removing the need for the CHS structure on the drive itself to match that used by the BIOS and consequently allowing up to 7.88 GiB when accessed through Int 13h interface. This barrier was overcome with Int 13H extensions, which used 64 bit linear address and therefore allowed access to the full 128 GiB and more (although some BIOSes initially had problems handling more than 31.5 GiB due to a bug in implementation).

ATA-6 introduced 48 bit addressing, increasing the limit to 128 PiB (or 144 petabytes). Some OS environments, including Windows 2000 until Service Pack 3, did not enable 48-bit LBA by default, so the user was required to take extra steps to get full capacity on a 160 GB drive.

All these size limitations come about because some part of the system is unable to deal with block addresses above some limit. This problem may manifest itself by the system recognizing no more of a drive than that limiting value, or by the system refusing to boot and hanging on the BIOS screen at the point when drives are initialized. In some cases, a BIOS upgrade for the motherboard will resolve the problem. This problem is also found in older external FireWire disk enclosures, which limit the usable size of a disk to 128 GB. By early 2005 most enclosures available have practically no limit. (Earlier versions of the popular Oxford 911 FireWire chipset had this problem. Later Oxford 911 versions and all Oxford 922 chips resolve the problem.)

[edit] Parallel ATA interface

Until the introduction of Serial ATA, 40-pin connectors generally attached drives to a ribbon cable. Each cable has two or three connectors, one of which plugs into an adapter that interfaces with the rest of the computer system. The remaining one or two connectors plug into drives. Parallel ATA cables transfer data 16 bits at a time (it is a common misconception that they transfer 32 bits of data at a time, mainly because the 40 cable ribbon would appear to allow this).

Parallel ATA Pins
Pin	Function	Pin	Function
1	Reset	2	Ground
3	Data 7	4	Data 8
5	Data 6	6	Data 9
7	Data 5	8	Data 10
9	Data 4	10	Data 11
11	Data 3	12	Data 12
13	Data 2	14	Data 13
15	Data 1	16	Data 14
17	Data 0	18	Data 15
19	Ground	20	Key (alternative usage is VCC_in)
21	DDRQ	22	Ground
23	I/O Write	24	Ground
25	I/O Read	26	Ground
27	IOC HRDY	28	Cable Select (see below)
29	DDACK	30	Ground
31	IRQ	32	No Connect
33	Addr 1	34	GPIO_DMA66_Detect (see below)
35	Addr 0	36	Addr 2
37	Chip Select 1P	38	Chip Select 3P
39	Activity	40	Ground

ATA's ribbon cables had 40 wires for most of its history, but an 80-wire version appeared with the introduction of the Ultra DMA/66 (UDMA4) mode. All of the additional wires in the new cable are ground wires, interleaved with the previously defined wires. The interleaved ground wire reduces the effects of capacitive coupling between neighboring signal wires, thereby reducing crosstalk. Capacitive coupling is more of a problem at higher transfer rates, and this change was necessary to enable the 66 megabytes per second (MB/s) transfer rate of UDMA4 to work reliably. The faster UDMA5 and UDMA6 modes also require 80-conductor cables.

Though the number of wires doubled, the number of connector pins and the pinout remain the same as on 40-conductor cables, and the external appearance of the connectors is identical. Internally, of course, the connectors are different: The connectors for the 80-wire cable connect a larger number of ground wires to a smaller number of ground pins, while the connectors for the 40-wire cable connect ground wires to ground pins one-for-one. 80-wire cables usually come with three differently colored connectors (blue, gray & black) as opposed to uniformly colored 40-wire cable's connectors (all black). The gray connector has pin 28 CSEL not connected; this makes it the slave position for drives configured cable select.

[edit] Using non-standard cables

The ATA standard has always specified a maximum cable length of just 46 cm (18 inches) and flat cables with particular impedance and capacitance characteristics. For various reasons, it may be desirable to use alternative cables : eg to have longer cables when connecting drives within a large computer case, or when mounting several physical drives into one computer ; or to use rounded cables to improve airflow (cooling) inside the computer case. Such cables are widely available on the market, and used successfully in most cases, however the user must understand that they are outside the parameters set by the specifications, and should be used with caution.

The short standard cable length all but completely eliminates the possibility of using parallel ATA for external devices.

[edit] Pin 20

In the ATA standard, Pin 20 is defined as key and is not used. However, some FLASH disks can use pin 20 as VCC_in to power disk without need of special power cable[1].

[edit] Pin 28

Pin 28 of the gray connector of an 80 conductor cable is not attached to any conductor of the cable. It is attached normally on the blue and black connectors.

[edit] Pin 34

Pin 34 is connected to ground inside the blue connector of an 80 conductor cable but not attached to any conductor of the cable. It is attached normally on the gray and black connectors. See page 315 of [2].

[edit] Differences between connectors on 80 conductor cables

The image shows PATA connectors after removal of strain relief, cover, and cable. Pin one is at bottom left of the connectors, pin 2 is top left, etc., except that the lower image of the blue connector shows the view from the opposite side, and pin one is at top right.

Each contact comprises a pair of points which together pierce the insulation of the ribbon cable with such precision that they make an excellent connection to the desired conductor without harming the insulation on the neighboring wires. The center row of contacts are all connected to the common ground bus and attach to the odd numbered conductors of the cable. The top row of contacts are the even-numbered sockets of the connector (mating with the even-numbered pins of the receptacle) and attach to every other even-numbered conductor of the cable. The bottom row of contacts are the odd-numbered sockets of the connector (mating with the odd-numbered pins of the receptacle) and attach to the remaining even-numbered conductors of the cable. (An alternate version of the connectors is allowed in which the even-numbered conductors are grounded and the odd-numbered conductors carry the signals. Obviously all three connectors on a cable must agree on this.)

Note the connections to the common ground bus from sockets 2 (top left), 19 (center bottom row), 22, 24, 26, 30, and 40 on all connectors. Also note (enlarged detail, bottom, looking from the opposite side of the connector) that socket 34 of the blue connector does not contact any conductor but unlike socket 34 of the other two connectors, it does connect to the common ground bus. On the gray connector, note that socket 28 is completely missing, so that pin 28 of the drive attached to the gray connector will be open. On the black connector, sockets 28 and 34 are completely normal, so that pins 28 and 34 of the drive attached to the black connector will be connected to the cable. Pin 28 of the black drive reaches pin 28 of the host receptacle but not pin 28 of the gray drive, while pin 34 of the black drive reaches pin 34 of the gray drive but not pin 34 of the host. Instead, pin 34 of the host is grounded.

The standard dictates color-coded connectors for easy identification by both installer and cable maker. All three connectors are different from one another. The blue (host) connector has the socket for pin 34 connected to ground inside the connector but not attached to any conductor of the cable. Since the old 40 conductor cables do not ground pin 34, the presence of a ground connection indicates that an 80 conductor cable is installed. The wire for pin 34 is attached normally on the other types and is not grounded. Installing the cable backwards (with the black connector on the system board, the blue connector on the remote device and the gray connector on the center device) will ground pin 34 of the remote device and connect host pin 34 through to pin 34 of the center device. The gray center connector omits the connection to pin 28 but connects pin 34 normally, while the black end connector connects both pins 28 and 34 normally.

(Although the standard itself must be purchased or consulted at a library, there is a draft copy available. See page 315 of the draft at [3], also archived at [4])

[edit] Multiple devices on a cable

If two devices attach to a single cable, one is commonly referred to as a master and the other as a slave. The master drive generally appears first when the computer's BIOS and/or operating system enumerates available drives. On old BIOSes (486 era and older) the drives are often misleadingly referred to by the BIOS as "C" for the master and "D" for the slave.

If there is a single device on a cable, in most cases it should be configured as master. However, some hard drives have a special setting called single for this configuration (Western Digital, in particular). Also, depending on the hardware and software available, a single drive on a cable can work reliably even though configured as the slave drive (this configuration is most often seen when a CDROM has a channel to itself).

[edit] Cable select

A drive setting called cable select was described as optional in ATA-1 and has come into fairly widespread use with ATA-5 and later. A drive set to "cable select" automatically configures itself as master or slave, according to its position on the cable. Cable select is controlled by pin 28. The host adapter grounds this pin; if a device sees that the pin is grounded, it becomes the master device; if it sees that pin 28 is open, the device becomes the slave device.

This setting is usually chosen by placing a jumper on the "cable select" position, usually marked CS, rather than on the "master" or "slave" position.

With the 40-wire cable it was very common to implement cable select by simply cutting the pin 28 wire between the two device connectors. This puts the slave device at the end of the cable, and the master on the "middle" connector. This arrangement eventually was standardized in later versions of the specification. If there is just one device on the cable, this results in an unused "stub" of cable. This is undesirable, both for physical convenience and electrical reasons: The stub causes signal reflections, particularly at higher transfer rates.

When the 80-wire cable was defined for use since ATAPI5/UDMA4, the master device goes at the end of the 18-inch cable(black connector), the middle-slave connector is grey, the blue connector goes onto the motherboard. So, if there is only one (master)device on the cable, there is no cable "stub" to cause reflections. Also, cable select is now implemented in the slave device connector, usually simply by omitting the ?contact? from the connector body. Both the 40-wire and 80-wire parallel-IDE cables share the same 40-socket connector configuration.

[edit] Master and slave clarification

Although they are in extremely common use, the terms master and slave do not actually appear in current versions of the ATA specifications. The two devices are correctly referred to as device 0 (master) and device 1 (slave), respectively. It is a common myth that "the master drive arbitrates access to devices on the channel" or that "the controller on the master drive also controls the slave drive." In fact, the drivers in the host operating system perform the necessary arbitration and serialization (as described in the next section), and each drive's controller operates independently. There is therefore no suggestion in the ATA protocols that one device has to ask the other if it can use the channel. Both are really "slaves" to the driver in the host OS.

[edit] Serialized, overlapped, and queued operations

The parallel ATA protocols up through ATA-3 require that once a command has been given to one device on an ATA interface, that command must complete before any subsequent command may be given to either device on the same interface. In other words, operations on the devices must be serialized—with only one operation in progress at a time—with respect to the ATA host interface.

For example, suppose a read operation is in progress on one drive on a given interface (cable). It is not possible to initiate another command on another drive on the same interface, or inform the first drive of additional commands that it should perform later (capabilities referred to as "overlapped" and "queued" operations, respectively), until the first drive's read operation completes. This is true even though the total I/O time is dominated by seek time and rotational latency, and during these phases, the first drive is transferring no data.

A useful mental model is that the host ATA interface is busy with the first request for its entire duration, and therefore can't be told about another request until the first one is complete.

The function of serializing requests to the interface is usually performed by a device driver in the host operating system.

The ATA-4 and subsequent versions of the specification have included both an "overlapped feature set" and a "queued feature set" as optional features. However, support for these is extremely rare in actual parallel ATA products and device drivers.

By contrast, overlapped and queued operations have been common in other storage buses for some time. In particular, tagged command queuing is characteristic of SCSI, and this has long been seen as a major advantage of SCSI over parallel ATA. The Serial ATA standard has supported what it calls native command queueing since its first released version, but the feature is present in only a few (generally the highest-priced) Serial ATA drives.

[edit] Two Devices on one cable - speed impact

There are many debates about how much a slow device can impact the performance of a faster device on the same cable. There is an effect, but the debate is confused by the blurring of two quite different causes, called here "Slowest Speed" and "One Operation at a Time".

[edit] "Slowest Speed"

It is a common misconception that, if two devices of different speed capabilities are on the same cable, both devices' data transfers will be constrained to the speed of the slower device.

For all modern ATA host adapters (since, at least, the late Pentium III and AMD K7 era) this is not true, as modern ATA host adapters support independent device timing. This allows each device on the cable to transfer data at its own best speed.

Even with older adapters without independent timing, this effect only impacts the data transfer phase of a read or write operation. This is usually the shortest part of a complete read or write operation (except for burst mode transfers).

[edit] "One Operation at a Time"

This is a much more important effect. It is caused by the omission of both overlapped and queued feature sets from most parallel ATA products. This means that only one device on a cable can perform a read or write operation at one time. Therefore, a fast device on the same cable as a slow device under heavy use will find that nearly every time it is asked to perform a transfer, it has to wait for the slow device to finish its own ponderous transfer.

For example, consider an optical device such as a DVD-ROM, and a hard drive on the same parallel ATA cable. With average seek and rotation speeds for such devices, a read operation to the DVD-ROM will take an average of around 100 milliseconds, while a typical fast parallel ATA hard drive can complete a read or write in less than 10 milliseconds. This means that the hard drive, if unencumbered, could perform more than 100 operations per second (and far more than that if only short head movements are involved). But since the devices are on the same cable, once a "read" command is given to the DVD-ROM, the hard drive will be inaccessible (and idle) for as long as it takes the DVD-ROM to complete its read—seek time included. Frequent accesses to the DVD-ROM will therefore vastly reduce the maximum throughput available from the hard drive. If the DVD-ROM is kept busy with average-duration requests, and if the host operating system driver sends commands to the two drives in a strict "round robin" fashion, then the hard drive will be limited to about 10 operations per second while the DVD-ROM is in use, even though the burst data transfers to and from the hard drive still happen at the hard drive's usual speed.

The impact of this on a system's performance depends on the application. For example, when copying data from an optical drive to a hard drive (such as during software installation), this effect probably doesn't matter: Such jobs are necessarily limited by the speed of the optical drive no matter where it is. But if the hard drive in question is also expected to provide good throughput for other tasks at the same time, it probably should not be on the same cable as the optical drive.

Remember that this effect occurs only if the slow drive is actually being accessed. The mere presence of an idle drive will not affect the performance of the other device on the cable (for a modern host adapter which supports independent timing).

[edit] ATA standards versions, transfer rates, and features

The following table shows the names of the versions of the ATA standards and the transfer modes and rates supported by each. Note that the transfer rate for each mode (for example, 66.7 MB/s for UDMA4, commonly called "Ultra-DMA 66") gives its maximum theoretical transfer rate on the cable. This is simply two bytes multiplied by the effective clock rate, and presumes that every clock cycle is used to transfer end-user data. In practice, of course, protocol overhead reduces this value.

Congestion on the host bus to which the ATA adapter is attached may also limit the maximum burst transfer rate. For example, the maximum data transfer rate for conventional PCI bus is 133 MB/s, and this is shared among all active devices on the bus.

In addition, no ATA hard drives exist capable of measured sustained transfer rates of above 80 MB/s. Furthermore, sustained transfer rate tests do not give realistic throughput expectations for most workloads: They use I/O loads specifically designed to encounter almost no delays from seek time or rotational latency. Hard drive performance under most workloads is limited first and second by those two factors; the transfer rate on the bus is a distant third in importance. Therefore, transfer speed limits above 66 MB/s really affect performance only when the hard drive can satisfy all I/O requests by reading from its internal cache — a very unusual situation, especially considering that such data is usually already buffered by the operating system.

Standard	Other Names	Transfer Modes Added (MB/s)	Maximum disk size	Other New Features	ANSI Reference
ATA-1	ATA, IDE	PIO 0, 1, 2 (3.3, 5.2, 8.3) Single-word DMA 0, 1 ,2 (2.1, 4.2, 8.3) Multi-word DMA 0 (4.2)	137 GB		X3.221-1994 (obsolete since 1999)
ATA-2	EIDE, Fast ATA, Fast IDE, Ultra ATA	PIO 3, 4: (11.1, 16.6) Multi-word DMA 1, 2 (13.3, 16.6)		28-bit logical block addressing (LBA)	X3.279-1996 (obsolete since 2001)
ATA-3	EIDE			S.M.A.R.T., Security	X3.298-1997 (obsolete since 2002)
ATA/ATAPI-4	ATA-4, Ultra ATA/33	Ultra DMA 0, 1, 2 (16.7, 25.0, 33.3) aka UDMA/33		AT Attachment Packet Interface (ATAPI), i.e. support for CD-ROM, tape drives etc., Optional overlapped and queued command set features, Host Protected Area (HPA)	NCITS 317-1998
ATA/ATAPI-5	ATA-5, Ultra ATA/66	Ultra DMA 3, 4 (44.4, 66.7) aka UDMA/66		80-wire cables	NCITS 340-2000
ATA/ATAPI-6	ATA-6, Ultra ATA/100	UDMA 5 (100) aka UDMA/100	144 PB	48-bit LBA, Device Configuration Overlay (DCO), Automatic Acoustic Management	NCITS 361-2002
ATA/ATAPI-7	ATA-7, Ultra ATA/133	UDMA 6 (133) aka UDMA/133 SATA/150		SATA 1.0, Streaming feature set, long logical/physical sector feature set for non-packet devices	NCITS 397-2005 (vol 1) NCITS 397-2005 (vol 2) NCITS 397-2005 (vol 3)
ATA/ATAPI-8	ATA-8	—		Hybrid drive featuring non-volatile cache to speed up critical OS files	In progress

In August 2004, Sam Hopkins and Brantley Coile of Coraid specified a lightweight ATA-over-Ethernet protocol to carry ATA commands over Ethernet instead of directly connecting them to a PATA host adapter. This permitted the established block protocol to be reused in Network-attached storage applications.