Thursday, November 17, 2011

Tips: How to reduce the iSCSI Login time delay on ESXi 5?

It has been noted that the number of retry iteration has been hard coded in the iSCSI stack to nine and this can not be modified.

A recent issue has been identified that the iSCSI boot process takes a longer than expected on ESXi 5. If the iSCSI targets aren’t available while ESXi host boots and the additional retry code that is added in ESXi 5, the issue may occur.

Here is what you would see in /var/log/syslog.log after ESXi host boots up:



Any how, there is a way for reducing the time delay significantly. However, it has been tested in a limited lab environment. If you’re still willing to perform the steps to get out of the time delay, you can continue reading this article.
Note: You may also test this in a staging/development environment before performing it any real system.
First, you can check out the iscsid.conf configuration file. However, VMware doesn’t use this default configuration file and uses small database file located in /etc/vmwre/vmkiscsid/vmkiscsid.db.


Now, as the .db is actively backed up ESXi, any changes would persist through system reboot and does not require any special tricks.

You can use the following command for dumping the entire database.
vmkiscsid --dump-db

In order to execute a specific SQL query such as viewing a particular database table, you can use the following command:
vmkiscsid -x "select * from discovery"

Reminder: This is an unsupported flag by VMware. However, you can try performing it at your own risk for your better result. It will save you from copying the iSCSI database to another host for edits.
If you wish to view the contents of the sqlite file, you can use the parameter parameter  discovery.ltime in the discovery table as it would alter the time and it will take for the iSCSI boot up process to complete when the iSCSI targets are unavailable during boot up. It would be better if you backup your vmkiscsid.db file before you make any changes.
You can use the following command to view the file:

sqlite3 vmkiscsid.db
.mode line *
select * from discovery;



The default value is 27 and it is found that as long as it is not this particular value, the retry loop is not executed or it exits almost immediately after retrying for only a few seconds.

If you wish to change the value, you can use the following SQL command:
update discovery set 'discovery.ltime'=1;
For verifying the value, you can use the following SQL command:
.mode line *
select * from discovery;





Once you have confirmed the change, you will need to type .quit to exit and then upload the modified vmkiscsid.db file to our ESXi 5 host.


Next to ensure the changes are saved immediately to the backup bootbank, run the /sbin/auto-backup.sh which will force an ESXi backup.

Now, you can disconnect the iSCSI target from the network and reboot your ESXi 5 host. Then, it should reduce the time delay of the boot process.





As you can see from this screenshot of the ESXi syslog.log, it took only 15 seconds to retry and continue through the boot up process.

Here, a vESXi host with software iSCSI initiator which binded to three VMkernel interfaces and connected to ten iSCSI targets on a Nexenta VSA.
Now, disconnect the network adapter on iSCSI target and modify the discovery.Itime on ESXi host and check out the logos to check for the time delay.

Here is a table of the results:


discovery.ltimeiSCSI Bootup Delay
115sec
315sec
615sec
1215sec
2415sec
2615sec
277min
2815sec
8115sec



You will understand the complete time delay with the above table. The longest time delay is 7 minutes in the environment and rest of the values are same.
You can also experiment the parameters using the following command:

node.session.initial_login_retry_max
Ultimately, due to the hard coding of the retry iterations and by modifying discovery.ltime, it bypasses the retry code or reduces the amount of retry all together.

B  y
,