敬业的IT人 >> 操作系统 >> FreeBSD >> LAM/MPI CLuster System With FreeBSD 5.3

LAM/MPI CLuster System With FreeBSD 5.3

敬业的IT人 互联网 佚名 2008-1-3 10:38:09

  [前言]

  MPI(Message Passing Interface)消息传送接口

  它不是一个协议,但它的地位已经实际上是一个协议了。它主要用于在分布式存储系统中的并行程序通信。MPI是一个函数库,它可以通过Fortran和C程序进行调用,MPI的好处是它速度比较快,而且移植性比较好。

  Cluster

  目前常见的Cluster(集群)架构有两种,一种是Web/Internet Cluster System,这种架构主要是将资料放置在不同的主机上面,亦即由多部主机同时负责一项服务;而另外一种则是所谓的平行运算了(Parallel Algorithms Cluster System)!平行运算其实就是将同一个运算的工作,交给整个Cluster里面的所有CPU来进行同步运算的一个功能。由于使用到多个CPU的运算能力,所以可以加快运算的速度。

  此文档所安装架设的LAM/MPI Cluster System属于后者,由于实验环境条件以及自身能力的限制,可能文档有部分解释不详尽,如有疑问请来信与我联系,我将尽力完善此文档,谢谢!

  [软件及平台]

  Server\\ FreeBSD5.3 Stable

  IP:172.18.5.247

  Hostname: center.the9.com

  Client \\ FreeBSD 5.3 Release

  IP:172.18.5.80

  Hostname: node1.the9.com

  apache_1.3.29 \\ All Ports Install

  php4-4.3.10

  php4-gd-4.3.10

  php4-extensions-1.0

  lam-6.5.9

  ganglia-monitor-core-2.5.6

  ganglia-webfrontend-2.5.5

  [目的]

  架设一套基于FreeBSD 5.3的LAM/MPI Cluster System.

  [安装及配置]

  一,各节点系统 /etc/hosts 的基本配置 \\ 如果内网有DNS,则配置好系统中的 /etc/resolv.conf 即可!

  center.the9.com

  #more /etc/hosts

  172.18.5.247 center.the9.com

  172.18.5.80 node1.the9.com

  node1.the9.com

  #more /etc/hosts

  172.18.5.247 center.the9.com

  172.18.5.80 node1.the9.com

  二,Apache+PHP Server 的架设

  center.the9.com

  #cd /usr/ports/www/apache13-modssl

  #make install clean \\ 安装 APACHE

  #cd /usr/ports/lang/php4-extensions

  #make install clean \\ 安装 PHP. 切记这里一定要选择安装GD库

  #vi /usr/local/etc/apache/http.conf \\ 加入以下相关参数

  AddType application/x-httpd-php .php

  AddType application/x-httpd-php-source .phps

  三,NFS Server-Client 的架设

进入讨论组讨论。

  NFS Server(center.the9.com)

  #vi /etc/rc.conf \\ 加入以下相关参数

  nfs_server_enable="YES"

  nfs_server_flags="-u -t -n 4 -h 172.18.5.247"

  mountd_enable="YES"

  mountd_flags="-r -l"

  rpcbind_enable="YES"

  rpcbind_flags="-l -h 172.18.5.247"

  #vi /etc/exports \\ 配置NFS共享目录

  /cluster -maproot=0:0 -network 172.18.5.0 -mask 255.255.255.0

  #/etc/rc.d/rpcbind start

  #/etc/rc.d/mountd start

  #/etc/rc.d/nfsd start \\ 启动NFS Server

  NFS Client(node1.the9.com)

  #vi /etc/rc.conf \\ 加入以下相关参数

  nfs_client_enable="YES"

  #vi /etc/fstab \\ 加入以下相关参数

  172.18.5.247:/cluster /cluster nfs rw 0 0

  #mount /cluster \\ Mount /Cluster 目录

  四,LAM/MPI Cluster System的架设

  Step 1: 基本安装

  center.the9.com

  #cd /usr/ports/net/lam

  #make install clean \\ 安装 LAM

  #cd /usr/ports/sysutils/ganglia-monitor-core

  #make install clean \\ 安装Cluster System 所需的Monitor Core

  #cd /usr/ports/sysutils/ganglia-webfrontend

  #make install clean \\ 安装上面Monitor Core 所需的WEB GUI

  node1.the9.com

  #cd /usr/ports/net/lam

  #make install clean \\ 安装 LAM

  #cd /usr/ports/sysutils/ganglia-monitor-core

  #make install clean \\ 安装Cluster System 所需的Monitor Core

  Step 2: 配置

  center.the9.com

  #cd /usr/local/etc/

  #cp gmond.conf.sample gmond.conf

  #cp gmetad.conf.sample gmetad.conf

  #vi gmond.conf \\ 修改name和mcast_if 的参数

  # The name of the cluster this node is a part of

  # default: "unspecified"

  name "BSDCluster"

  # The multicast interface for gmond to send/receive data on

  # default: the kernel decides based on routing configuration

  mcast_if lnc0

  #vi gmetad.conf \\ 修改data_source 的参数

  # data_source "my cluster" 10 localhost my.machine.edu:8649 1.2.3.5:8655

正文:进入讨论组讨论。

  # data_source "my grid" 50 1.3.4.7:8655 grid.org:8651 grid-backup.org:8651

  # data_source "another source" 1.3.4.7:8655 1.3.4.8

  data_source "BSDCluster" 10 center.the9.com:8649 node1.the9.com:8649

  #vi /usr/local/etc/lam-bhost.def \\ 加入各node 的hostname

  center.the9.com

  node1.the9.com

  node1.the9.com \\ 基本上,每个新增节点的配置都要和以上center.the9.com 的配置一致.

  node2.the9.com

  nodeX.the9.com ........

  五,Monitor WEBGUI 的配置

  center.the9.com

  #vi /usr/local/etc/apache/http.conf \\ 加入以下相关参数,配置Cluster Monitor Web的路径

  Alias /ganglia/ "/usr/local/www/ganglia/"

  <Directory "/usr/local/www/ganglia">

  Options Indexes FollowSymlinks MultiViews

  AllowOverride None

  Order allow,deny

  Allow from all

  </Directory>

  #vi /etc/rc.conf \\ 加入以下参数

  apache_enable="YES"

  apache_flags="-DSSL"

  apache_pidfile="/var/run/httpd.pid"

  #/usr/local/etc/rc.d/apache.sh start \\ 启动APACHE

  六,启动并调试Cluster System以及检查测试

  center.the9.com node1.the9.com nodeX.the9.com etc....

  #/usr/local/etc/rc.d/gmetad.sh start

  #/usr/local/etc/rc.d/gmond.sh start \\ 启动Cluster 各Node的Monitor Core

  center.the9.com

  $lamboot -dv \\ 启动各节点的lam daemon

  LAM 6.5.9/MPI 2 C++/ROMIO - Indiana University

  lamboot: boot schema file: /usr/local/etc/lam-bhost.def

  lamboot: opening hostfile /usr/local/etc/lam-bhost.def

  lamboot: found the following hosts:

  lamboot: n0 center.the9.com

  lamboot: n1 node1.the9.com

  lamboot: resolved hosts:

  lamboot: n0 center.the9.com --> 172.18.5.247

  lamboot: n1 node1.the9.com --> 172.18.5.80

  lamboot: found 2 host node(s)

  lamboot: origin node is 0 (center.the9.com)

  Executing hboot on n0 (center.the9.com - 1 CPU)...

  lamboot: attempting to execute "hboot -t -c lam-conf.lam -d -v -I " -H 172.18.5.247 -P 53433 -n 0 -o 0 ""

进入讨论组讨论。

  hboot: process schema = "/usr/local/etc/lam-conf.lam"

  hboot: found /usr/local/bin/lamd

  hboot: performing tkill

  hboot: tkill

  hboot: booting...

  hboot: fork /usr/local/bin/lamd

  [1] 28338 lamd -H 172.18.5.247 -P 53433 -n 0 -o 0 -d

  hboot: attempting to execute

  Executing hboot on n1 (node1.the9.com - 1 CPU)...

  lamboot: attempting to execute "/usr/bin/ssh node1.the9.com -n echo $SHELL"

  lamboot: got remote shell /bin/sh

  lamboot: attempting to execute "/usr/bin/ssh node1.the9.com -n (. ./.profile; hboot -t -c lam-conf.lam -d -v -s -I "-H 172.18.5.247 -P 53433 -n 1 -o 0 " )"

  hboot: process schema = "/usr/local/etc/lam-conf.lam"

  hboot: found /usr/local/bin/lamd

  hboot: performing tkill

  hboot: tkill

  hboot: booting...

  hboot: fork /usr/local/bin/lamd

  [1] 43110 lamd -H 172.18.5.247 -P 53433 -n 1 -o 0 -d

  topology done

  lamboot completed successfully

  $lamhalt -dv \\ 停止各节点的lam daemon

  $ lamhalt -dv

  LAM 6.5.9/MPI 2 C++/ROMIO - Indiana University

  Shutting down LAM

  lamhalt: sending HALT to n1 (node1.the9.com)

  lamhalt: waiting for HALT ACKs from remote LAM daemons

  lamhalt: received HALT ACK from n1 (node1.the9.com)

  lamhalt: sending final HALT to n0 (center.the9.com)

  lamhalt: local LAM daemon halted

  LAM halted

  $lamnodes \\ 查看node info

  $lamexec N echo "hello" \\ 查看node run status

  center.the9.com

  #ps ax

  28338 ?? I 0:00.04 /usr/local/bin/lamd -H 172.18.5.247 -P 53433 -n 0 -o 0 -d

  node1.the9.com

  #ps ax

  43110 ?? S 0:00.05 /usr/local/bin/lamd -H 172.18.5.247 -P 53433 -n 1 -o 0 -d

  ClusterMonitor WEBGUI

  http://center.the9.com/ganglia/ \\ 用这个查看系统数据,还是很直观的,是以RRDTool 生成的 images. :)

  CPUs Total: 2

  Hosts up: 2

  Hosts down: 0

文字:进入讨论组讨论。

  Avg Load (15, 5, 1m):

  1%, 4%, 0%

  Localtime:

  2004-12-31 10:50

  Total CPUs: 2

  Total Memory: 0.2 GB

  Total Disk: 8.0 GB

  Most Full Disk: 61.2% Used \\ 实验环境的机器比较烂,见谅见谅. :)

  [参考]

  http://lam-mpi.org/ lam-mpi

  http://www.beowulf.org/ beowulf FAQ

  http://www.lasg.ac.cn/cgi-bin/forum/topic.cgi?forum=4&topic=2247 MPI ClusterWith RH9

  http://lists.freebsd.org/mailman/listinfo/freebsd-cluster freebsd cluster maillist

文字:进入讨论组讨论。
粤ICP备06119539号
Copyright CiscoSky.Org,Some Rights Reserved.
Email:me1228#tom.com