Data scientists always need to implement their pipelines of tasks on remote servers. SSH is the most powerful software package that enables secure system administration and file transfers in networks. There are many blogs introducing the knowledge of Linux server or SSH from the aspect of a programmer. In this article, I will ONLY introduce several things you must know about SSH as a data scientist.
Connection to Remote Server with SSH
We can use command
ssh to connect to a remote server. To use this command, we need first to install a SSH client in your system. Luckily, most computer operation systems provide it already.
If you use Windows 10, then open
PowerShell and enter 'SSH'. If you use Mac OS/Linux system, then open
Terminal and enter 'SSH'. You will see the information like this:
Here are many options but you really don't need to know. The only information you need is how to log in now. For example, if your account name is 'abc' and the hostname is 'link.com'. You can log in using the following command:
You may be prompted to enter 'yes' for the first time connection and asked to enter your password, just do it. Then you will connect to the remote server successfully. Cheers!
If you fail to connect, remember to connect to the VPN of your organization. 90% log in failure is made by this.
That's all. You can also use software like Putty or set parameters. But we'll just leave it there.
Transfer Files with SCP
SCP is a software to transfer files between computers. It is included by default in most systems.
To transfer file
file in current dictionary to your remote server, you don't need to log in with SSH. Just use the following command:
Remember to add the colon(:). If you forget it, the file won't get transferred.
Download Files from Internet
Instead of transferring files from your laptop to the server, you may need to download some data directly from the internet to the server.
To do that, you first right click on the files to download and click
copy link. Then we use
wget command on the remote server (after you log in).
The software will show the process of downloading and finally the file will be downloaded to the current dictionary.
Keep Session Alive
This may be the most important part. Most remote servers will check your status and turn off the connection if they find you not working. However, it will also turn off the connection if you're running some programs. To keep your session alive, we can run the session on the background with tools like
Screen helps you generate a session that run on the background and will not be closed by the server. To generate a session:
and then you can run your program in this screen session.
To leave the session: Press ALT+A, and then press D. After you leave the session, the work in session is still on.