Web Scraping using Flask with Headless Selenium Server running on Amazon EC2 Instance.

 

Setting Up Flask Application on Amazon EC2 Instance

  1. Create a new Ubuntu Server 16.04 LTS (HVM) AMI as your EC2 instance. Choose t2.micro under free tier, review and launch the instance.
  2. Configure the security group, with an SSH port under your IP and port 80 anywhere for HTTP. This setting allows access to port 80 (HTTP) from anywhere, and ssh access only from your IP address.
  3. Select your EC2 instance and click the connect button. Install OpenSSH-server on your local computer copy the example(ssh -i “your-key.pem” ubuntu@Your-DNS.compute.amazonaws.com) into your terminal.
  4. Setting up Instance for flask app hosting –

Install the Apache web server and mod_wsgi used to implements a WSGI compliant interface for hosting Python based web applications on top of the Apache web server.

$ sudo apt-get install python-dev python-pip apache2
$ sudo apt-get install libapache2-mod-wsgi
$ sudo a2enmod wsgi
$ sudo pip install flask

Create a flask app using amazing flask tutorial from here.ย a basic example would be a directory in our home directory to work in, and link to it from the site-root defined in Apache’s configuration (/var/www/html by default, see /etc/apache2/sites-enabled/000-default.conf for the current value).

$ mkdir ~/flaskapp
$ sudo ln -sT ~/flaskapp /var/www/html/flaskapp
$ cd ~/flaskapp
$ echo "Hello World" > index.html

You should now see “Hello World” displayed if you navigate to (your instance public DNS)/flaskapp in your browser. Ex-Your-DNS.compute.amazonaws.com/flaskapp

helloworldhtml

Let’s create a simple flask app. Fetch HelloWorld example from Flask Documentation

1.Create a flaskapp.py file inside the folder and paste the HelloWorld code.

from flask import Flask
app = Flask(__name__)
@app.route('/')
def hello_world():
  return 'Hello from Flask!'
if __name__ == '__main__':
  app.run()

2.Create a .wsgi file to load the app. Put the following content in .wsgi file named flaskapp.wsgi

import sys
sys.path.insert(0, '/var/www/html/flaskapp')
from flaskapp import app as application

3.Enable mod_wsgi.

In the Apache configuration file located at /etc/apache2/sites-enabled/000-default.conf, add the following block just after the DocumentRoot /var/www/html line:
$ sudo vim /etc/apache2/sites-enabled/000-default.conf

WSGIDaemonProcess flaskapp threads=5
WSGIScriptAlias / /var/www/html/flaskapp< span class="p">/flaskapp.wsgi
<Directory flaskapp>
    WSGIProcessGroup flaskapp
    WSGIApplicationGroup %{GLOBAL}
    Require all granted
</Directory>

Screenshot from 2017-08-19 19-24-35Your file should look like this.

4. Restart the web server.

$ sudo apachectl restart

5. Test configuration.

If you navigate your browser to your EC2 instance’s public DNS again (Your-DNS.compute.amazonaws.com), you should see the text returned by the hello_world function of our app, “Hello from Flask!” Our server is now running and ready to do the job (if something isn’t working, try checking the log file in /var/log/apache2/error.log).

Setting Up Headless Selenium on Amazon EC2 Instance

[TBC]

Advertisements

Python Script to print E-Lab Reports

Being a Python enthusiast and a bit lazy to click on Evaluate button and then Print Report button to print all my Elab reports, idea struck my mind to use Selenium WebDriver API to automate a python script which asks MathsLab No. , Register Number and Password over a python GUI and further prints all reports to a folder created on respective PATH.

What we need :

1. Tkinter – Python module to create GUIs
2. Selenium WebDriver API
3. Chrome Driver or PhantomJS with PATH.
4. Extract web-elements ID’s, X-Paths, ClassName.

How to install Python Modules?
pip install tkinter
pip install -U selenium

How to download Chrome Driver?
Choose your OS. Download and copy its local path and paste in the script.
https://sites.google.com/a/chromium.org/chromedriver/downloads

Steps to run script –

1. Save the script on desktop and open in IDLE or your desired Python editor.

2. Change the path of ChromeDriver with the path of your ChromeDriver
3. Make a folder and copy the path in the script. This is the folder where all prints would be stored.
3. Run the script. ๐Ÿ˜€
4. 1 min 35 sec and its done ๐Ÿ˜€

Code –

https://github.com/pushkalkatara/Python-Selenium-Scripting/blob/master/elab.py


My Printed Reports ๐Ÿ˜€

Watch it work ๐Ÿ˜€

Codechef Question Forwarded to Slack

Competitive Programming is the base of computer science as it indicates the efficiency of the code mixed with algorithms and applied mind.

In order to make a practice to solve a competitive programming question every day, I created this python script to send a question from code chef to my slack account every day.

Installation :

1.Selenium WebDriver API
2.Chrome Driver or PhantomJS with PATH
3.Extract web-elements IDโ€™s, X-Paths, ClassName.

How to run theย script?

Link To Script.

1. Save the script on the desktop and open in IDLE.
2. Change the path of ChromeDriver with the path of your ChromeDriver.
3. Enter your Slack API token.
3. Run the script. ๐Ÿ˜€