用Python下载PHET互动仿真程序
PhET是什么呢?
PhET是一款开放、免费的互动仿真程序。由诺贝尔物理学奖获得者卡尔•威曼于2002年发起,在科罗拉多大学制作、运行。旨在通过自由互动仿真程序,在世界范围内提高人们的科学数学素养。
由诺贝尔奖获得者卡尔·威曼于2002年创立,PhET互动仿真程序计划由科罗拉多大学的团队专项运营,旨在创建免费的数学和科学互动程序。 PhET 是基于拓展型教育的相关研究并且激励学生在直观的、游戏化的环境中进行探索和发现。
PhET官网地址:
https://phet.colorado.edu/zh_CN/
为什么要下载PHET互动仿真程序
原因很简单,PhET官网位于国外,要打开这个官网真的需要足够的耐心。最终找到需要的互动仿真程序,没有5分钟是打不开的。PhET官网提供H5互动仿真程序的下载方法,下面的代码的功能就是全自动下载PhET的免费的互动仿真程序。
代码提供的功能
- 检查更新
程序自动判断下载内容的时间,如果大于7天,则重新检测互动仿真程序的版本号,如果有新的版本,则进行更新下载。
- 自动下载
全自动的下载PhET的免费的H5互动仿真程序,包括图片和HTML网页。运行中可能会出现超时等问题,这时只需要重新运行代码就可以了。
- 生成索引页面
会自动生成index.html索引页,打开这个网页,就可以看到下载好的H5互动仿真程序。
Python代码及功能介绍
先上源码
#!/usr/bin/env python # -*- encoding: utf-8 -*- """ PHET网站H5仿真演示实验镜像下载 实现版本对比下载 """ import datetime import json import os import re import sys import time import requests headers = { 'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:84.0) Gecko/20100101 Firefox/84.0', 'Connection': 'close' } # 在get请求内,添加user-agent session = requests.Session() session.keep_alive = False def down_img(sim_name: str, url: str): full_path = os.path.join('pub', sim_name+'.png') if not os.path.exists(full_path): img = session.get(url, headers=headers).content with open(full_path, 'wb') as f: f.write(img) def file_over_7day(full_path: str) -> bool: if os.path.exists(full_path): filemt1 = time.localtime(os.stat(full_path).st_mtime) # 文件修改时间 t1 = time.mktime(filemt1) filemt2 = time.localtime() # 不带参数就是当前时间 t2 = time.mktime(filemt2) days = datetime.timedelta(seconds=t2 - t1).days return days > 7 else: return True def get_file_content(full_path: str, url: str) -> str: if file_over_7day(full_path): s = session.get(url, headers=headers).text with open(full_path, 'w', encoding='utf-8') as f: f.write(s) else: with open(full_path, 'r', encoding='utf-8') as f: s = f.read() return s projectsJsonFilePath = os.path.join('tmp', 'projects.json') simsPath = os.path.join('pub', "sims.json") if not file_over_7day(simsPath) and os.path.exists(simsPath): print('近期已经下载过了') #sys.exit() projectsJsonFileUrl = 'https://phet.colorado.edu/services/metadata/1.3/simulations?format=json&summary=' projectsJson = get_file_content(projectsJsonFilePath, projectsJsonFileUrl) j = json.loads(projectsJson) runUrl = 'https://phet.colorado.edu/sims/html/{{sim-name}}/latest/{{sim-name}}_zh_CN.html' thumbnailUrl = j['common']['html']['thumbnailUrl'] sims = [] for p in j['projects']: if p['type'] != 0: continue for s in p['simulations']: sim = {'name': s['name'], 'cver': s['cheerpjVersion']} if 'zh_CN' in s['localizedSimulations']: sim['title'] = s['localizedSimulations']['zh_CN']['title'] else: sim['title'] = s['localizedSimulations']['en']['title'] sims.append(sim) re_des = re.compile('<p class="simulation\\-panel\\-indent" itemprop="description about">([^<]*)') re_ver = re.compile('<div class="sim\\-version" itemprop="version softwareVersion">版本 (.*?)</div>') re_url = re.compile('<a href="/sims/html/([^"]*)" id="simulation-main-link-run-main"') #https://phet.colorado.edu/sims/html/gene-expression-essentials/latest/gene-expression-essentials-600.png re_img = re.compile('<img class="simulation-main-screenshot" src="/sims/html/(.*?)" width="300"') for s in sims: html_url = 'https://phet.colorado.edu/zh_CN/simulation/' + s['name'] print(html_url) htmlPath = os.path.join('tmp', s['name'] + '_info.html') html_str = get_file_content(htmlPath, html_url) print(s['name']) m_des = re_des.search(html_str) if m_des is not None: s['des'] = m_des.group(1) m_ver = re_ver.search(html_str) if m_ver is not None: s['ver'] = m_ver.group(1) if html_str.find('sim-page-badge html-badge') == -1: s['h5'] = False continue else: s['h5'] = True m_img = re_img.search(html_str) if m_img is not None: img_url = 'https://phet.colorado.edu/sims/html/' + m_img.group(1) print(img_url) down_img(s['name'], img_url) m_url = re_url.search(html_str) if m_url is not None: sim_url = 'https://phet.colorado.edu/sims/html/' + m_url.group(1) print(sim_url) sim_path = os.path.join('pub', s['name'] + '_zh_CN.html') get_file_content(sim_path, sim_url) simsJson = json.dumps(sims, ensure_ascii=False) with open(simsPath, 'w', encoding='utf-8') as f: f.write('var data=') f.write(simsJson) f.write(';') print('完成')
代码说明
- PhET所有互动仿真程序的json数据地址:
https://phet.colorado.edu/services/metadata/1.3/simulations?format=json&summary=
解析这个json就可以获取所有PhET互动仿真程序的信息
- 获取互动仿真程序的信息:
html_url = 'https://phet.colorado.edu/zh_CN/simulation/' + s['name']
- 获取到网页后,再通过正则表达式获取信息
re_des = re.compile('<p class="simulation\\-panel\\-indent" itemprop="description about">([^<]*)')
re_ver = re.compile('<div class="sim\\-version" itemprop="version softwareVersion">版本 (.*?)</div>')
re_url = re.compile('<a href="/sims/html/([^"]*)" id="simulation-main-link-run-main"')
#https://phet.colorado.edu/sims/html/gene-expression-essentials/latest/gene-expression-essentials-600.png
re_img = re.compile('<img class="simulation-main-screenshot" src="/sims/html/(.*?)" width="300"')
- 下载图片和仿真程序就好了
- 生成数据文件,供html中引用
生成的网页示例:
http://www.sfzd5.com/page/phet/index.html