Loading Documents
In this guide, we'll take a look at how to load documents with Cheerio and when to use the different loading methods.
If you're familiar with jQuery, then this step will be new to you. jQuery operates on the one, baked-in DOM. With Cheerio, we need to pass in the HTML document.
The loadBuffer
, stringStream
, decodeStream
, and fromURL
methods are not
available in the browser environment. Instead, use the load
method to parse
HTML strings.
load
The load method is the most basic way to parse an HTML or XML document with Cheerio. It takes a string containing the document as its argument and returns a Cheerio object that you can use to traverse and manipulate the document.
Here's an example of how to use the load method:
import * as cheerio from 'cheerio';
const $ = cheerio.load('<h1>Hello, world!</h1>');
console.log($('h1').text());
// Output: Hello, world!
Similar to web browser contexts, load
will introduce <html>
, <head>
, and
<body>
elements if they are not already present. You can set load
's third
argument to false
to disable this.
const $ = cheerio.load('<ul id="fruits">...</ul>', null, false);
$.html();
//=> '<ul id="fruits">...</ul>'
Learn more about the load
method in the
API documentation.
loadBuffer
The loadBuffer
method is similar to the load
method, but it takes a buffer
containing the document as its argument instead of a string. Cheerio will run
the HTML encoding sniffing algorithm to determine the encoding of the document.
This is useful when you have the document in binary form, such as when you're
reading it from a file or receiving it over a network connection.
Here's an example of how to use the loadBuffer
method:
import * as cheerio from 'cheerio';
import * as fs from 'fs';
const buffer = fs.readFileSync('document.html');
const $ = cheerio.loadBuffer(buffer);
console.log($('title').text());
// Output: Hello, world!
Learn more about the loadBuffer
method in the
API documentation.
stringStream
When loading an HTML document from a stream and the encoding is known, you can
use the stringStream
method to parse it into a Cheerio object.
import * as cheerio from 'cheerio';
import * as fs from 'fs';
const writeStream = cheerio.stringStream({}, (err, $) => {
if (err) {
// Handle error
}
console.log($('title').text());
// Output: Hello, world!
});
fs.createReadStream('document.html', { encoding: 'utf8' }).pipe(writeStream);
Learn more about the stringStream
method in the
API documentation.
decodeStream
When loading an HTML document from a stream and the encoding is not known, you
can use the decodeStream
method to parse it into a Cheerio object. This method
runs the HTML encoding sniffing algorithm to determine the encoding of the
document.
Here's an example of how to use the decodeStream
method:
import * as cheerio from 'cheerio';
import * as fs from 'fs';
const writeStream = cheerio.decodeStream({}, (err, $) => {
if (err) {
// Handle error
}
console.log($('title').text());
// Output: Hello, world!
});
fs.createReadStream('document.html').pipe(writeStream);
Learn more about the decodeStream
method in the
API documentation.
fromURL
The fromURL
method allows you to load a document from a URL. This method is
asynchronous, so you need to use await
(or a then
block) to access the
resulting Cheerio object.
import * as cheerio from 'cheerio';
const $ = await cheerio.fromURL('https://example.com');
Learn more about the fromURL
method in the
API documentation.
Conclusion
Cheerio provides several methods for loading HTML documents and parsing them into a DOM structure. These methods are useful for different scenarios, depending on the type and source of the HTML data. Users are encouraged to read through each of these methods and pick the one that best suits their needs.