A minimalist guide to getting started with SSH

Data scientists always need to implement their pipelines of tasks on remote servers. SSH is the most powerful software package that enables secure system administration and file transfers in networks. There are many blogs introducing the knowledge of Linux server or SSH from the aspect of a programmer. In this article, I will ONLY introduce several things you must know about SSH as a data scientist.

Connection to Remote Server with SSH

We can use command ssh to connect to a remote server. To use this command, we need first to install a SSH client in your system. Luckily, most computer operation systems provide it already.

If you use Windows 10, then open PowerShell and enter 'SSH'. If you use Mac OS/Linux system, then open Terminal and enter 'SSH'. You will see the information like this:


Here are many options but you really don't need to know. The only information you need is how to log in now. For example, if your account name is 'abc' and the hostname is 'link.com'. You can log in using the following command:

ssh abc@link.com

You may be prompted to enter 'yes' for the first time connection and asked to enter your password, just do it. Then you will connect to the remote server successfully. Cheers!

If you fail to connect, remember to connect to the VPN of your organization. 90% log in failure is made by this.

That's all. You can also use software like Putty or set parameters. But we'll just leave it there.

Transfer Files with SCP

SCP is a software to transfer files between computers. It is included by default in most systems.

To transfer file file in current dictionary to your remote server, you don't need to log in with SSH. Just use the following command:

scp file abc@link.com:

Remember to add the colon(:). If you forget it, the file won't get transferred.

Download Files from Internet

Instead of transferring files from your laptop to the server, you may need to download some data directly from the internet to the server.

To do that, you first right click on the files to download and click copy link. Then we use wget command on the remote server (after you log in).

wget http:link.com

The software will show the process of downloading and finally the file will be downloaded to the current dictionary.

Keep Session Alive

This may be the most important part. Most remote servers will check your status and turn off the connection if they find you not working. However, it will also turn off the connection if you're running some programs. To keep your session alive, we can run the session on the background with tools like screen.

Screen helps you generate a session that run on the background and will not be closed by the server. To generate a session:

screen -S nameofsession

and then you can run your program in this screen session.

To leave the session: Press ALT+A, and then press D. After you leave the session, the work in session is still on.

Write a Comment