观察基于 create-react-doc 搭建的文档站点, 发现网页代码光秃秃的一片(见下图)。这显然是单页应用 (SPA) 站点的通病 —— 不利于文档被搜索引擎搜索 (SEO)。
难道 SPA 站点就无法进行 SEO 了么, 那么 Gatsby、nuxt 等框架又为何能作为不少博主搭建博客的首选方案呢, 此类框架赋能 SEO 的技术原理是什么呢? 在好奇心的驱动下, 笔者尝试对 creat-react-doc 进行赋能 SEO 之旅。
在实践之前, 先从理论上分析为何单页应用不能被搜索引擎搜索到。核心在于 爬虫蜘蛛在执行爬取的过程中, 不会去执行网页中的 JS 逻辑
, 所以隐藏在 JS 中的跳转逻辑也不会被执行
。
查看当前 SPA 站点打包后的代码, 除了一个根目录 index.html 外, 其它都是注入的 JS 逻辑, 因此浏览器自然不会对其进行 SEO。
此外, 搜索引擎详优化是一门较复杂的学问。如果你对 SEO 优化比较陌生, 建议阅读搜索引擎优化 (SEO) 新手指南 一文, Google 搜索中心给出了全面的 17 个最佳做法, 以及 33 个应避免的做法, 这也是笔者近期在实践的部分。
在轻文档站点的背景前提下, 我们暂不考虑 SSR 方案。
对市面上文档站点的 SEO 方案调研后, 笔者总结为如下四类:
静态模板渲染方案以 hexo 最为典型, 此类框架需要指定特定的模板语言(比如 pug)来开发主题, 从而达到网页内容直出的目的。
404 重定向方案的原理主要是利用 GitHub Pages 的 404 机制进行重定向。比较典型的案例有 spa-github-pages、sghpa。
但是遗憾的是 2019 年 Google 调整了爬虫算法, 因此此类重定向方案在当下是无利于 SEO 的。spa-github-pages 作者也表示如果需要 SEO 的话, 使用 SSG 方案或者付费方案 Netlify。
SSG 方案全称为 static site generator, 中文可译为路由静态化方案
。社区上 nuxt、Gatsby、docusaurus 等框架赋能 SEO 的技术无一例外可以归类此类 SSG 方案。
以 nuxt 框架为例, 在约定式路由
的基础上, 其通过执行 nuxt generate
命令将 vue 文件转化为静态网页。
例子:
-| pages/---| about.vue/---| index.vue/
静态化后变成:
-| dist/---| about/-----| index.html---| index.html
经过路由静态化后, 此时的文档目录结构可以托管于任何一个静态站点服务商。
经过上文对 SSG 方案的分析, 此时 SPA 站点的优化关键已经跃然纸上 —— 静态化路由
。相较于 nuxt、Gatsby 等框架存在约定式路由的限制, create-react-doc 在目录结构上的组织灵活自由。它的建站理念是文件即站点
, 同时它对存量 markdown 文档的迁移也十分便捷。
以 blog 项目结构为例, 其文档结构如下:
-| BasicSkill/---| basic/-----| DOM.md-----| HTML5.md
静态化后应该变成:
-| BasicSkill/---| basic/-----| DOM-------| index.html-----| HTML5-------| index.html
经过调研, 该构思与 prerender-spa-plugin 预渲染方案一拍即合。预渲染方案的原理可以见如下图:
至此技术选型定下为使用预渲染方案实现 SSG。
create-react-doc 在预渲染方案实践的步骤简单概况如下(完整改动可见 mr):
export default function RouterRoot() {return (- <HashRouter>+ <BrowserRouter><RoutersContainer />- </HashRouter>+ </BrowserRouter>)}
预渲染环境
, 同时对路由进行环境匹配。其主要解决了资源文件
与主域名下的子路径
的对应关系。过程比较曲折, 感兴趣的同学可以见 issue。const ifProd = env === 'prod'+ const ifPrerender = window.__PRERENDER_INJECTED && window.__PRERENDER_INJECTED.prerender+ const ifAddPrefix = ifProd && !ifPrerender<Routekey={item.path}exact- path={item.path}+ path={ifAddPrefix ? `/${repo}${item.path}` : item.path}render={() => { ... }}/>
官方版本当前未支持 webpack 5, 详见 issue, 同时笔者存在对预渲染后执行回调的需求。因此当前 fork 了一份版本 出来, 解决了以上问题。
经过上述步骤的实践, 终于在 SPA 站点中实现了静态化路由。
SEO 优化至此, 来看下站点优化前后 FP、FCP、LCP 等指标数据的变化。
以 blog 站点为例, 优化前后的指标数据如下(数据指标统计来自未使用梯子访问 gh-pages):
优化前: 接入预渲染方案前, 首次绘制(FP、FCP) 的时间节点在 8s 左右, LCP 在 17s 左右。
优化后: 接入预渲染方案后, 首次绘制时间节点在 1s
之内开始, LCP 在 1.5s 之内。
对比优化前后: 首屏绘制速度提升了 8
倍, 最大内容绘制速度提升 11
倍。本想优化 SEO, 结果站点性能优化的方式又 get 了一个。
在完成预渲染实现站点路由静态化后, 距离 SEO 的目标又近了一步。暂且抛开 SEO 优化细节, 单刀直入 SEO 核心腹地 站点地图。
站点地图 Sitemap 格式与各字段含义简单说明如下:
<?xml version="1.0" encoding="utf-8"?><urlset><!-- 必填标签, 这是具体某一个链接的定义入口,每一条数据都要用 <url> 和 </url> 包含在里面, 这是必须的 --><url><!-- 必填, URL 链接地址,长度不得超过 256 字节 --><loc>http://www.yoursite.com/yoursite.html</loc><!-- 可以不提交该标签, 用来指定该链接的最后更新时间 --><lastmod>2021-03-06</lastmod><!-- 可以不提交该标签, 用这个标签告诉此链接可能会出现的更新频率 --><changefreq>daily</changefreq><!-- 可以不提交该标签, 用来指定此链接相对于其他链接的优先权比值,此值定于 0.0-1.0 之间 --><priority>0.8</priority></url></urlset>
上述 sitemap 中, lastmod、changefreq、priority 字段对 SEO 没那么重要, 可以见 how-to-create-a-sitemap
根据上述结构, 笔者开发了 create-react-doc 的站点地图生成包 crd-generator-sitemap, 其逻辑就是将预渲染的路由路径拼接成上述格式。
使用方只需在站点根目录的 config.yml
添加如下参数便可以在自动化发版过程中自动生成 sitemap。
seo:google: true
将生成的站点地图往 Google Search Console 中提交试试吧,
最后验证下 Google 搜索站点优化前后效果。
优化前: 只搜索到一条数据。
优化后: 搜索到站点地图中声明的位置数据。
至此使用 SSG 优化 SPA 站点实现 SEO 的完整流程完整实现了一遍。后续便剩下参照 搜索引擎优化 (SEO) 新手指南 做一些 SEO 细节方面的优化以及支持更多搜索引擎了。
本文从 SPA 站点实现 SEO 作为切入点, 先后介绍了 SEO 的基本原理, SEO 在 SPA 站点中的 4 种实践案例, 并结合 create-react-doc SPA 框架进行完整的 SEO 实践。
Observe that document site built based on create-react-doc, I found the webpage code is bare(see the picture below). This is obviously a common problem of single-page application (SPA) sites. It is not conducive to be searched by search engines (SEO).
Isn't it possible that SPA sites can't perform SEO, so what about frameworks such as Gatsby, nuxt It can be used as the first choice for many bloggers to build blogs. What are the technical principles of such frameworks to empower SEO? Driven by curiosity, I start my journey of empowering SEO in creat-react-doc.
Before practice, let's analyze why single-page applications cannot be searched by search engines. The core is that the crawler spider will not execute the JavaScript logic in the webpage during the crawling process
, so the jump logic hidden in the JavaScript will not be executed either
.
Check the packaged code of the current SPA site. Except for a root directory index.html, everything else is injected with JavaScript logic, so the browser will naturally not perform SEO on it.
In addition, detailed search engine optimization is a more complicated subject. If you are new to SEO optimization, it is recommended to read Search Engine Optimization (SEO) Beginner's Guide article, given by Google Search Center. There are a comprehensive list of 17 best practices, and 33 practices that should be avoided.
In the context of the light document site, we do not consider the SSR scheme for the time being.
After investigating the SEO schemes of document sites on the market, the author summarizes the following four categories:
hexo is the most typical in the static template rendering scheme. Such frameworks need to specify a specific template language (such as pug) to develop themes, so as to achieve the purpose of direct output of web content.
The principle of the 404 redirect solution is mainly to use the 404 mechanism of GitHub Pages for redirection. Typical cases are spa-github-pages, sghpa.
But unfortunately, in 2019 Google adjusted crawler algorithm, so this kind of redirection scheme is not conducive to SEO at the moment. The author of spa-github-pages also stated that if SEO is required, use the SSG plan or the paid plan Netlify.
The full name of the SSG scheme is called static site generator
. In the community, nuxt, Gatsby and other framework-enabling SEO technologies can be classified without exception such SSG schemes.
Taking the nuxt framework as an example, based on the conventional routing
, it converts vue files into static web pages by executing the nuxt generate
command.
example:
-| pages/---| about.vue/---| index.vue/
After being static, it becomes:
-| dist/---| about/-----| index.html---| index.html
After the routing is static, the document directory structure at this time can be hosted by any static site service provider.
After the above analysis of the SSG scheme, at this time the key to optimization of the SPA site is already on paper —— static routing
. Compared with frameworks such as nuxt and Gatsby, which have the limitation of conventional routing, create-react-doc has flexible and free organization in the directory structure. Its website building concept is File is Site
, and it is also very convenient to migrate existing markdown documents.
Take blog project structure as an example, the document structure is as follows:
-| BasicSkill/---| basic/-----| DOM.md-----| HTML5.md
It should become:
-| BasicSkill/---| basic/-----| DOM-------| index.html-----| HTML5-------| index.html
After investigation, the idea and the prerender-spa-plugin pre-rendering solution hit it off. The principle of the pre-rendering scheme can be seen in the following figure:
So far, the technology selection is determined to use the pre-rendering scheme to achieve SSG.
A brief overview of the steps of create-react-doc's practice in the pre-rendering solution is as follows (for complete changes, see mr):
export default function RouterRoot() {return (-<HashRouter>+ <BrowserRouter><RoutersContainer />-</HashRouter>+ </BrowserRouter>)}
pre-rendering environment
on the basis of development environment and generation environment, and matched the routing environment at the same time. It mainly solves the correspondence between resource files
and sub-paths under the main domain name
. The process is tortuous, and interested friends can see issue.const ifProd = env ==='prod'+ const ifPrerender = window.__PRERENDER_INJECTED && window.__PRERENDER_INJECTED.prerender+ const ifAddPrefix = ifProd && !ifPrerender<Routekey={item.path}exact-path={item.path}+ path={ifAddPrefix? `/${repo}${item.path}`: item.path}render={() => {... }}/>
The official version currently does not support webpack 5, see issue for details, and I have a need to execute callbacks after pre-rendering. Therefore, a copy of version is currently forked, which solves the above problems.
After the practice of the above steps, static routing is finally implemented in the SPA site.
SEO optimization so far, let's look at the changes in FP
, FCP
, LCP
and other indicator data before and after site optimization.
Taking the blog site as an example, the index data before and after optimization is as follows:
Before optimization: Before accessing the pre-rendering scheme, the time node for the first drawing (FP, FCP) is about 8s
, and the LCP is about 17s.
After optimization: After accessing the pre-rendering scheme, the first drawing time node starts within 1s
, and the LCP is within 1.5s.
Comparing the optimization between before and after: the first screen drawing speed has been increased by 8
times, and the maximum content drawing speed has been increased by 11
times. I wanted to optimize SEO, but I got another way to optimize site performance.
After finishing the pre-rendering and realizing the static routing of the site, it is one step closer to the goal of SEO. Putting aside SEO optimization details for the time being, go straight to the core hinterland of SEO site map.
The format of Sitemap and the meaning of each field are briefly explained as follows:
<?xml version="1.0" encoding="utf-8"?><urlset><!-- Required tag, this is the definition entry of a specific link, each piece of data must be included with <url> and </url>, this is required --><url><!-- Required, URL link address, length must not exceed 256 bytes --><loc>http://www.yoursite.com/yoursite.html</loc><!-- You don't need to submit the tag, which is used to specify the last update time of the link --><lastmod>2021-03-06</lastmod><!-- You don't need to submit the tag, use this tag to tell the update frequency of this link --><changefreq>daily</changefreq><!-- You don’t need to submit the tag, which is used to specify the priority ratio of this link to other links. This value is set between 0.0-1.0 --><priority>0.8</priority></url></urlset>
In the above sitemap, the lastmod, changefreq, and priority fields are not so important for SEO, see [how-to-create-a-sitemap](https://ahrefs.com/blog/zh/how-to-create-a -sitemap/)
According to the above structure, I developed the sitemap generation package crd-generator-sitemap, the logic is to splice the pre-rendered routing path into the above format.
The user only needs to add the following parameters in the site root directory config.yml
to automatically generate sitemap during the automatic release process.
seo:google: true
Submit the generated sitemap to Google Search Console for a try,
Finally, verify the before and after optimization of Google search site.
Before optimization: Only one piece of data is found.
After optimization: Search the location data declared in the site map.
So far, the complete process of using SSG to optimize SPA sites to achieve SEO has been fully realized. Follow-up is left to refer to the Search Engine Optimization (SEO) Beginner's Guide to optimize some SEO details and support more searches The engine is up.
This article starts with the realization of SEO on the SPA site, and successively introduces the basic principles of SEO, four practical cases of SEO in the SPA site, combined with create-react-doc SPA framework for complete SEO practice.
If this article is helpful to you, welcome star, feedback.