Friday, May 17, 2024
HomePythonReverse Engineering Fb API: Non-public Video Downloader

Reverse Engineering Fb API: Non-public Video Downloader


Welcome again! That is the third put up within the reverse engineering sequence. The primary put up was reverse engineering Soundcloud API and the second was reverse engineering Fb API to obtain public movies. On this put up we’ll check out downloading non-public movies. We are going to reverse engineer the API calls made by Fb and can strive to determine how we are able to obtain movies within the HD format (when out there).

Step 1: Recon

The very first step is to open up a non-public video in an incognito tab simply to verify we cannot entry it with out logging it. This must be the response from Fb:

Image

This confirms that we cannot entry the video with out logging in. Typically that is fairly apparent however it doesn’t damage to examine.

We all know of our first step. It’s to determine a strategy to log-into Fb utilizing Python. Solely after that may we entry the video. Let’s login utilizing the browser and examine what info is required to log-in.

I gained’t go into a lot element for this step. The gist is that whereas logging in, the desktop web site and the cell web site require roughly the identical POST parameters however apparently if you happen to log-in utilizing the cell web site you don’t have to provide lots of extra info which the desktop web site requires. You will get away with doing a POST request to the next URL together with your username and password:

https://m.fb.com/login.php

We are going to later see that the next API requests would require a _fbdtsg parameter. The worth of this parameter is embedded within the HTML response and might simply be extracted utilizing common expressions or a DOM parsing library.

Let’s proceed exploring the web site and the video API and see what we are able to discover.

Similar to what we did within the final put up, open up the video, monitor the XHR requests within the Developer Instruments and seek for the MP4 request.

Image

Subsequent step is to determine the place the MP4 hyperlink is coming from. I attempted looking the unique HTML web page however couldn’t discover the hyperlink. Which means Fb is utilizing an XHR API request to get the URL from the server. We have to search by all the XHR API requests and examine their responses for the video URL. I did simply that and the response of the third API request contained the MP4 hyperlink:

Image

The API request was a POST request and the url was:

https://www.fb.com/video/tahoe/async/10114393524323267/?chain=true&isvideo=true&originalmediaid=10214393524262467&playerorigin=permalink&playersuborigin=tahoe&ispermalink=true&numcopyrightmatchedvideoplayedconsecutively=0&storyidentifier=DzpfSTE1MzA5MDEwODE6Vks6MTAyMTQzOTMNjE4Njc&dpr=2

I attempted to deconstruct the URL. The most important dynamic components of the URL appear to be the originalmediaid and _storyidentifier. _I searched the unique HTML web page and located that each of those have been there within the unique video web page. We additionally want to determine the POST knowledge despatched with this request. These are the parameters which have been despatched:

__user: <---redacted-->
__a: 1
__dyn: <---redacted-->
__req: 3
__be: 1
__pc: PHASED:DEFAULT
__rev: <---redacted-->
fb_dtsg: <---redacted-->
jazoest: <---redacted-->
__spin_r:  <---redacted-->
__spin_b:  <---redacted-->
__spin_t:  <---redacted-->

I’ve redacted a lot of the stuff in order that my private info is just not leaked. However you get the thought. I once more searched the HTML web page and was capable of finding a lot of the info within the web page. There was sure info which was not within the HTML web page like _jazoest _but as we transfer alongside you will notice that we don’t really want it to obtain the video. We will merely ship an empty string as an alternative.

It looks as if now we have all of the items we have to obtain a video. Right here is an overview:

  1. Open the Video after logging in
  2. Seek for the parameters within the HTML response to craft the API url
  3. Open the API url with the required POST parameters
  4. Seek for _hdsrc or _sdsrc within the response of the API request

Now lets create a script to automate these duties for us.

Step 2: Automate it

The very first step is to determine how the login takes place. Within the recon section I discussed that you would be able to simply log-in utilizing the cell web site. We are going to do precisely that. We are going to log-in utilizing the cell web site after which open the homepage utilizing the authenticated cookies in order that we are able to extract the _fbdtsg parameter from the homepage for subsequent requests.

import requests 
import re
import urllib.parse

electronic mail = ""
password = ""

session = requests.session()
session.headers.replace({
  'Consumer-Agent': 'Mozilla/5.0 (X11; Linux i686; rv:39.0) Gecko/20100101 Firefox/39.0'
})
response = session.get('https://m.fb.com')
response = session.put up('https://m.fb.com/login.php', knowledge={
  'electronic mail': electronic mail,
  'move': password
}, allow_redirects=False)

Exchange the e-mail and password variable together with your electronic mail and password and this script ought to log you in. How do we all know whether or not now we have efficiently logged in? We will examine for the presence of ‘c_user’ key within the cookies. If it exists then the login has been profitable.

Let’s examine that and extract the fb_dtsg from the homepage. Whereas we’re at that permit’s extract the user_id from the cookies as nicely as a result of we’ll want it later.

if 'c_user' in response.cookies:
    # login was profitable
    homepage_resp = session.get('https://m.fb.com/residence.php')
    fb_dtsg = re.search('identify="fb_dtsg" worth="(.+?)"', homepage_resp.textual content).group(1)
    user_id = response.cookies['c_user']

So now we have to open up the video web page, extract all the required API POST arguments from it and do the POST request.

if 'c_user' in response.cookies:
    # login was profitable
    homepage_resp = session.get('https://m.fb.com/residence.php')
    fb_dtsg = re.search('identify="fb_dtsg" worth="(.+?)"', homepage_resp.textual content).group(1)
    user_id = response.cookies['c_user']
    
    video_url = "https://www.fb.com/username/movies/101214393524261127/"
    video_id = re.search('movies/(.+?)/', video_url).group(1)

    video_page = session.get(video_url)
    identifier = re.search('ref=tahoe","(.+?)"', video_page.textual content).group(1)
    final_url = "https://www.fb.com/video/tahoe/async/{0}/?chain=true&isvideo=true&originalmediaid={0}&playerorigin=permalink&playersuborigin=tahoe&ispermalink=true&numcopyrightmatchedvideoplayedconsecutively=0&storyidentifier={1}&dpr=2".format(video_id,identifier)
    
    knowledge = {'__user': user_id,
            '__a': '',
            '__dyn': '',
            '__req': '',
            '__be': '',
            '__pc': '',
            '__rev': '',
            'fb_dtsg': fb_dtsg,
            'jazoest': '',
            '__spin_r': '',
            '__spin_b': '',
            '__spin_t': '',
    }
    api_call = session.put up(final_url, knowledge=knowledge)
    strive:
        final_video_url = re.search('hd_src":"(.+?)",', api_call.textual content).group(1)
    besides AttributeError:
        final_video_url = re.search('sd_src":"(.+?)"', api_call.textual content).group(1)
print(final_video_url)

You may be questioning what the knowledge dictionary is doing and why there are lots of keys with empty values. Like I stated in the course of the recon course of, I attempted making profitable POST requests utilizing the minimal quantity of information. Because it seems Fb solely cares about _fbdtsg and the __consumer key. You may let all the things else be an empty string. Just be sure you do ship these keys with the request although. It doesn’t work if the secret is solely absent.

On the very finish of the script we first seek for the HD supply after which the SD supply of the video. If HD supply is discovered we output that and if not then we output the SD supply.

Our last script appears to be like one thing like this:

import requests 
import re
import urllib.parse
import sys

electronic mail = sys.argv[-2]
password = sys.argv[-1]

print("Electronic mail: "+electronic mail)
print("Go:  "+password)

session = requests.session()
session.headers.replace({
  'Consumer-Agent': 'Mozilla/5.0 (X11; Linux i686; rv:39.0) Gecko/20100101 Firefox/39.0'
})
response = session.get('https://m.fb.com')
response = session.put up('https://m.fb.com/login.php', knowledge={
  'electronic mail': electronic mail,
  'move': password
}, allow_redirects=False)

if 'c_user' in response.cookies:
    # login was profitable
    homepage_resp = session.get('https://m.fb.com/residence.php')
    fb_dtsg = re.search('identify="fb_dtsg" worth="(.+?)"', homepage_resp.textual content).group(1)
    user_id = response.cookies['c_user']
    
    video_url = sys.argv[-3]
    print("Video url:  "+video_url)
    video_id = re.search('movies/(.+?)/', video_url).group(1)

    video_page = session.get(video_url)
    identifier = re.search('ref=tahoe","(.+?)"', video_page.textual content).group(1)
    final_url = "https://www.fb.com/video/tahoe/async/{0}/?chain=true&isvideo=true&originalmediaid={0}&playerorigin=permalink&playersuborigin=tahoe&ispermalink=true&numcopyrightmatchedvideoplayedconsecutively=0&storyidentifier={1}&dpr=2".format(video_id,identifier)
    
    knowledge = {'__user': user_id,
            '__a': '',
            '__dyn': '',
            '__req': '',
            '__be': '',
            '__pc': '',
            '__rev': '',
            'fb_dtsg': fb_dtsg,
            'jazoest': '',
            '__spin_r': '',
            '__spin_b': '',
            '__spin_t': '',
    }
    api_call = session.put up(final_url, knowledge=knowledge)
    strive:
        final_video_url = re.search('hd_src":"(.+?)",', api_call.textual content).group(1)
    besides AttributeError:
        final_video_url = re.search('sd_src":"(.+?)"', api_call.textual content).group(1)

print(final_video_url.change('',''))

I made a few adjustments to the script. I used sys.argv to get video_url, electronic mail and password from the command line. You may hardcore your username and password if you need.

Save the above file as _facebookdownloader.py and run it like this:

$ python facebook_downloader.py video_url electronic mail password

Exchange video_url with the precise video url like this https://www.fb.com/username/movies/101214393524261127/ and change the e-mail and password together with your precise electronic mail and password.

After working this script, it would output the supply url of the video to the terminal. You may open the URL in your browser and from there it’s best to be capable of right-click and obtain the video simply.

I hope you guys loved this fast tutorial on reverse engineering the Fb API for making a video downloader. If in case you have any questions/feedback/options please put them within the feedback beneath or electronic mail me. I’ll have a look at reverse engineering a distinct web site for my subsequent put up. Observe my weblog to remain up to date!

Thanks! Have an important day!

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments