Ok, so when we pull the plug on one of the pi's, we have of each component falling down, but because one of them is the arbitrator (node-id=2) then cluster falls over.
Before the 'accident':
ndb_mgm -e showConnected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)] 2 node(s)
id=3 @10.0.0.6 (mysql-5.5.25 ndb-7.3.0, Nodegroup: 0, Master)
id=4 @10.0.0.7 (mysql-5.5.25 ndb-7.3.0, Nodegroup: 0)
[ndb_mgmd(MGM)] 2 node(s)
id=1 @10.0.0.6 (mysql-5.5.25 ndb-7.3.0)
id=2 @10.0.0.7 (mysql-5.5.25 ndb-7.3.0)
[mysqld(API)] 4 node(s)
id=10 @10.0.0.6 (mysql-5.5.25 ndb-7.3.0)
id=11 @10.0.0.7 (mysql-5.5.25 ndb-7.3.0)
id=12 (not connected, accepting connect from any host)
id=13 (not connected, accepting connect from any host)
Whoops, who pulled that plug?
Everything on mypi02 is instantly down, as seen by mgmt_node on mypi01:ndb_mgm -e show
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)] 2 node(s)
id=3 @10.0.0.6 (mysql-5.5.25 ndb-7.3.0, Nodegroup: 0, Master)
id=4 (not connected, accepting connect from mypi02)
[ndb_mgmd(MGM)] 2 node(s)
id=1 @10.0.0.6 (mysql-5.5.25 ndb-7.3.0)
id=2 @10.0.0.7 (mysql-5.5.25 ndb-7.3.0)
[mysqld(API)] 4 node(s)
id=10 @10.0.0.6 (mysql-5.5.25 ndb-7.3.0)
id=11 (not connected, accepting connect from mypi02)
id=12 (not connected, accepting connect from any host)
id=13 (not connected, accepting connect from any host)
but... a few seconds later:
ndb_mgm -e show
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)] 2 node(s)
id=3 (not connected, accepting connect from mypi01)
id=4 (not connected, accepting connect from mypi02)
[ndb_mgmd(MGM)] 2 node(s)
id=1 @10.0.0.6 (mysql-5.5.25 ndb-7.3.0)
id=2 @10.0.0.7 (mysql-5.5.25 ndb-7.3.0)
[mysqld(API)] 4 node(s)
id=10 (not connected, accepting connect from mypi01)
id=11 (not connected, accepting connect from mypi02)
id=12 (not connected, accepting connect from any host)
id=13 (not connected, accepting connect from any host)
Check the mgmt_node log:
vi /opt/mysql/mysql/mgmd_data/ndb_1_cluster.log
...
..
2013-09-04 14:06:44 [MgmtSrvr] WARNING -- Node 3: Node 2 missed heartbeat 2
..
..
2013-09-04 14:07:02 [MgmtSrvr] ALERT -- Node 3: Forced node shutdown completed. Caused by error 2305: 'Node lost connection to other nodes and can not form a unpartitioned cluster, please investigate if there are error(s) on other node(s)(Arbitration error). Temporary error, restart node'.
So the physical architecture is too limited. Let's limit the logical architecture to 1 Management Node:
vi config.ini
--> comment out nodeid=2 entry in the [ndb_mgmd] section.
vi my.cnf
--> remove mypi02:1186 from the ndb-connectstring entries in both sections [mysqld] & [mysql_cluster].
Now to start it all back up, but this time with --reload, to make sure the changes are accepted.
On mypi01:
ndb_mgmd -f /usr/local/mysql/conf/config.ini --config-dir=/usr/local/mysql/conf --ndb-nodeid=1 --reload
ndbd -c mypi01
On mypi02:
ndbd -c mypi01
Check status, make sure the data nodes have started ok.
Then on mypi01:
mysqld --defaults-file=/usr/local/mysql/conf/my.cnf --user=mysql &
On mypi02:
mysqld --defaults-file=/usr/local/mysql/conf/my.cnf --user=mysql &
And to make sure we're all connected up fine:
From mypi02 (remember, with only 1 mgmtnode on mypi01 now):
ndb_mgm -e show -c mypi01
Connected to Management Server at: mypi01:1186
Cluster Configuration
---------------------
[ndbd(NDB)] 2 node(s)
id=3 @10.0.0.6 (mysql-5.5.25 ndb-7.3.0, Nodegroup: 0, Master)
id=4 @10.0.0.7 (mysql-5.5.25 ndb-7.3.0, Nodegroup: 0)
[ndb_mgmd(MGM)] 1 node(s)
id=1 @10.0.0.6 (mysql-5.5.25 ndb-7.3.0)
[mysqld(API)] 4 node(s)
id=10 @10.0.0.6 (mysql-5.5.25 ndb-7.3.0)
id=11 @10.0.0.7 (mysql-5.5.25 ndb-7.3.0)
id=12 (not connected, accepting connect from any host)
id=13 (not connected, accepting connect from any host)
That's better.
No comments:
Post a Comment