The Moz Hyperlinks API: An Introduction

News Author



What precisely IS an API? They’re these issues that you just copy and paste lengthy unusual codes into Screaming Frog for hyperlinks knowledge on a Web site Crawl, proper?

I’m right here to let you know there’s a lot extra to them than that – in case you’re prepared to take only a few little steps. However first, some fundamentals.

What’s an API?

API stands for “software programming interface”, and it’s simply the best way of… utilizing a factor. The whole lot has an API. The net is a huge API that takes URLs as enter and returns pages.

However particular knowledge companies just like the Moz Hyperlinks API have their very own algorithm. These guidelines fluctuate from service to service and generally is a main stumbling block for individuals taking the subsequent step.

When Screaming Frog provides you the additional hyperlinks columns in a crawl, it’s utilizing the Moz Hyperlinks API, however you may have this functionality anyplace. For instance, all that tedious guide stuff you do in spreadsheet environments will be automated from data-pull to formatting and emailing a report.

In case you take this subsequent step, you will be extra environment friendly than your rivals, designing and delivering your personal search engine optimization companies as a substitute of relying upon, paying for, and being restricted by the subsequent proprietary product integration.

GET vs. POST

Most APIs you’ll encounter use the identical knowledge transport mechanism as the online. Which means there’s a URL concerned identical to an internet site. Don’t get scared! It’s simpler than you suppose. In some ways, utilizing an API is rather like utilizing an internet site.

As with loading net pages, the request could also be in one in all two locations: the URL itself, or within the physique of the request. The URL is known as the “endpoint” and the customarily invisibly submitted additional a part of the request is known as the “payload” or “knowledge”. When the information is within the URL, it’s referred to as a “question string” and signifies the “GET” technique is used. You see this on a regular basis if you search:

https://www.google.com/search?q=moz+hyperlinks+api <-- GET technique 

When the information of the request is hidden, it’s referred to as a “POST” request. You see this if you submit a kind on the internet and the submitted knowledge doesn’t present on the URL. While you hit the again button after such a POST, browsers often warn you in opposition to double-submits. The rationale the POST technique is commonly used is that you would be able to match much more within the request utilizing the POST technique than the GET technique. URLs would get very lengthy in any other case. The Moz Hyperlinks API makes use of the POST technique.

Making requests

An internet browser is what historically makes requests of internet sites for net pages. The browser is a kind of software program referred to as a shopper. Purchasers are what make requests of companies. Extra than simply browsers could make requests. The flexibility to make shopper net requests is commonly constructed into programming languages like Python, or will be damaged out as a standalone device. The most well-liked instruments for making requests exterior a browser are curl and wget.

We’re discussing Python right here. Python has a built-in library referred to as URLLIB, but it surely’s designed to deal with so many various kinds of requests that it’s a little bit of a ache to make use of. There are different libraries which might be extra specialised for making requests of APIs. The most well-liked for Python is known as requests. It’s so fashionable that it’s used for nearly each Python API tutorial you’ll discover on the internet. So I’ll use it too. That is what “hitting” the Moz Hyperlinks API appears like:

response = requests.publish(endpoint, knowledge=json_string, auth=auth_tuple)

On condition that all the things was arrange appropriately (extra on that quickly), this may produce the next output:

{'next_token': 'JYkQVg4s9ak8iRBWDiz1qTyguYswnj035nqrQ1oIbW96IGJsb2dZgGzDeAM7Rw==',
 'outcomes': [{'anchor_text': 'moz',
              'external_pages': 7162,
              'external_root_domains': 2026}]}

That is JSON knowledge. It is contained inside the response object that was returned from the API. It’s not on the drive or in a file. It’s in reminiscence. As long as it’s in reminiscence, you are able to do stuff with it (typically simply saving it to a file).

In case you needed to seize a bit of information inside such a response, you may check with it like this:

response['results'][0]['external_pages']

This says: “Give me the primary merchandise within the outcomes listing, after which give me the external_pages worth from that merchandise.” The end result could be 7162.

NOTE: In case you’re truly following alongside executing code, the above line received’t work alone. There’s a certain quantity of setup we’ll do shortly, together with putting in the requests library and organising just a few variables. However that is the essential thought.

JSON

JSON stands for JavaScript Object Notation. It’s a method of representing knowledge in a method that’s straightforward for people to learn and write. It’s additionally straightforward for computer systems to learn and write. It’s a quite common knowledge format for APIs that has considerably taken over the world for the reason that older methods have been too tough for most individuals to make use of. Some individuals would possibly name this a part of the “restful” API motion, however the way more tough XML format can be thought of “restful” and everybody appears to have their very own interpretation. Consequently, I discover it finest to only give attention to JSON and the way it will get out and in of Python.

Python dictionaries

I lied to you. I mentioned that the information construction you have been above was JSON. Technically it’s actually a Python dictionary or dict datatype object. It’s a particular type of object in Python that’s designed to carry key/worth pairs. The keys are strings and the values will be any sort of object. The keys are just like the column names in a spreadsheet. The values are just like the cells within the spreadsheet. On this method, you may consider a Python dict as a JSON object. For instance right here’s making a dict in Python:

my_dict = {
    "title": "Mike",
    "age": 52,
    "metropolis": "New York"
}

And right here is the equal in JavaScript:

var my_json = {
    "title": "Mike",
    "age": 52,
    "metropolis": "New York"
}

Just about the identical factor, proper? Look intently. Key-names and string values get double-quotes. Numbers don’t. These guidelines apply persistently between JSON and Python dicts. In order you may think, it’s straightforward for JSON knowledge to circulation out and in of Python. This can be a nice present that has made fashionable API-work extremely accessible to the newbie by way of a device that has revolutionized the sector of information science and is making inroads into advertising, Jupyter Notebooks.

Flattening knowledge

However beware! As knowledge flows between programs, it’s not unusual for the information to subtly change. For instance, the JSON knowledge above is likely to be transformed to a string. Strings would possibly look precisely like JSON, however they’re not. They’re only a bunch of characters. Typically you’ll hear it referred to as “serializing”, or “flattening”. It’s a delicate level, however price understanding as it’s going to assist with one of many largest hindrances with the Moz Hyperlinks (and most JSON) APIs.

Objects have APIs

Precise JSON or dict objects have their very own little APIs for accessing the information within them. The flexibility to make use of these JSON and dict APIs goes away when the information is flattened right into a string, however it’s going to journey between programs extra simply, and when it arrives on the different finish, it will likely be “deserialized” and the API will come again on the opposite system.

Information flowing between programs

That is the idea of moveable, interoperable knowledge. Again when it was referred to as Digital Information Interchange (or EDI), it was a really huge deal. Then alongside got here the online after which XML after which JSON and now it’s only a regular a part of doing enterprise.

In case you’re in Python and also you wish to convert a dict to a flattened JSON string, you do the next:

import json

my_dict = {
    "title": "Mike",
    "age": 52,
    "metropolis": "New York"
}

json_string = json.dumps(my_dict)

…which might produce the next output:

'{"title": "Mike", "age": 52, "metropolis": "New York"}'

This appears nearly the identical as the unique dict, however in case you look intently you may see that single-quotes are used across the whole factor. One other apparent distinction is that you would be able to line-wrap actual structured knowledge for readability with none unwell impact. You may’t do it so simply with strings. That’s why it’s introduced all on one line within the above snippet.

Such stringifying processes are accomplished when passing knowledge between completely different programs as a result of they don’t seem to be at all times suitable. Regular textual content strings alternatively are suitable with nearly all the things and will be handed on web-requests with ease. Such flattened strings of JSON knowledge are regularly known as the request.

Anatomy of a request

Once more, right here’s the instance request we made above:

response = requests.publish(endpoint, knowledge=json_string, auth=auth_tuple)

Now that you just perceive what the variable title json_string is telling you about its contents, you shouldn’t be stunned to see that is how we populate that variable:

 data_dict = {
    "goal": "moz.com/weblog",
    "scope": "web page",
    "restrict": 1
}

json_string = json.dumps(data_dict)

…and the contents of json_string appears like this:

'{"goal": "moz.com/weblog", "scope": "web page", "restrict": 1}'

That is one in all my key discoveries in studying the Moz Hyperlinks API. That is in widespread with numerous different APIs on the market however journeys me up each time as a result of it’s a lot extra handy to work with structured dicts than flattened strings. Nonetheless, most APIs count on the information to be a string for portability between programs, so we’ve got to transform it on the final second earlier than the precise API-call happens.

Pythonic masses and dumps

Now chances are you’ll be questioning in that above instance, what a dump is doing in the midst of the code. The json.dumps() operate is known as a “dumper” as a result of it takes a Python object and dumps it right into a string. The json.masses() operate is known as a “loader” as a result of it takes a string and masses it right into a Python object.

The rationale for what look like singular and plural choices are literally binary and string choices. In case your knowledge is binary, you utilize json.load() and json.dump(). In case your knowledge is a string, you utilize json.masses() and json.dumps(). The s stands for string. Leaving the s off means binary.

Don’t let anyone let you know Python is ideal. It’s simply that its tough edges are usually not excessively objectionable.

Project vs. equality

For these of you utterly new to Python or programming on the whole, what we’re doing once we hit the API is known as an task. The results of requests.publish() is being assigned to the variable named response.

response = requests.publish(endpoint, knowledge=json_string, auth=auth_tuple)

We’re utilizing the = signal to assign the worth of the precise aspect of the equation to the variable on the left aspect of the equation. The variable response is now a reference to the thing that was returned from the API. Project is completely different from equality. The == signal is used for equality.

# That is task:
a = 1  # a is now equal to 1

# That is equality:
a == 1  # True, however depends that the above line has been executed

The POST technique

response = requests.publish(endpoint, knowledge=json_string, auth=auth_tuple)

The requests library has a operate referred to as publish() that takes 3 arguments. The primary argument is the URL of the endpoint. The second argument is the information to ship to the endpoint. The third argument is the authentication data to ship to the endpoint.

Key phrase parameters and their arguments

You could discover that a few of the arguments to the publish() operate have names. Names are set equal to values utilizing the = signal. Right here’s how Python capabilities get outlined. The primary argument is positional each as a result of it comes first and in addition as a result of there’s no key phrase. Keyworded arguments come after position-dependent arguments. Belief me, all of it is sensible after some time. All of us begin to suppose like Guido van Rossum.

def arbitrary_function(argument1, title=argument2):
    # do stuff

The title within the above instance is known as a “key phrase” and the values that are available in on these places are referred to as “arguments”. Now arguments are assigned to variable names proper within the operate definition, so you may check with both argument1 or argument2 anyplace inside this operate. In case you’d prefer to study extra in regards to the guidelines of Python capabilities, you may examine them right here.

Establishing the request

Okay, so let’s allow you to do all the things essential for that success assured second. We’ve been displaying the essential request:

response = requests.publish(endpoint, knowledge=json_string, auth=auth_tuple)

…however we haven’t proven all the things that goes into it. Let’s do this now. In case you’re following alongside and don’t have the requests library put in, you are able to do so with the next command from the identical terminal atmosphere from which you run Python:

pip set up requests

Usually instances Jupyter can have the requests library put in already, however in case it doesn’t, you may set up it with the next command from inside a Pocket book cell:

!pip set up requests

And now we will put all of it collectively. There’s only some issues right here which might be new. An important is how we’re taking 2 completely different variables and mixing them right into a single variable referred to as AUTH_TUPLE. You’ll have to get your personal ACCESSID and SECRETKEY from the Moz.com web site.

The API expects these two values to be handed as a Python knowledge construction referred to as a tuple. A tuple is an inventory of values that don’t change. I discover it fascinating that requests.publish() expects flattened strings for the knowledge parameter, however expects a tuple for the auth parameter. I suppose it is sensible, however these are the delicate issues to grasp when working with APIs.

Right here’s the total code:

import json
import pprint
import requests

# Set Constants
ACCESSID = "mozscape-1234567890"  # Substitute together with your entry ID
SECRETKEY = "1234567890abcdef1234567890abcdef"  # Substitute together with your secret key
AUTH_TUPLE = (ACCESSID, SECRETKEY)

# Set Variables
endpoint = "https://lsapi.seomoz.com/v2/anchor_text"
data_dict = {"goal": "moz.com/weblog", "scope": "web page", "restrict": 1}
json_string = json.dumps(data_dict)

# Make the Request
response = requests.publish(endpoint, knowledge=json_string, auth=AUTH_TUPLE)

# Print the Response
pprint(response.json())

…which outputs:

{'next_token': 'JYkQVg4s9ak8iRBWDiz1qTyguYswnj035nqrQ1oIbW96IGJsb2dZgGzDeAM7Rw==',
 'outcomes': [{'anchor_text': 'moz',
              'external_pages': 7162,
              'external_root_domains': 2026}]}

Utilizing all higher case for the AUTH_TUPLE variable is a conference many use in Python to point that the variable is a continuing. It’s not a requirement, but it surely’s a good suggestion to comply with conventions when you may.

You could discover that I didn’t use all uppercase for the endpoint variable. That’s as a result of the anchor_text endpoint is just not a relentless. There are a variety of various endpoints that may take its place relying on what kind of lookup we needed to do. The alternatives are:

  1. anchor_text

  2. final_redirect

  3. global_top_pages

  4. global_top_root_domains

  5. index_metadata

  6. link_intersect

  7. link_status

  8. linking_root_domains

  9. hyperlinks

  10. top_pages

  11. url_metrics

  12. usage_data

And that leads into the Jupyter Pocket book that I ready on this subject situated right here on Github. With this Pocket book you may prolong the instance I gave right here to any of the 12 out there endpoints to create a wide range of helpful deliverables, which would be the topic of articles to comply with.