Opening of the Open Search Symposum #OSSYM2020: There are four large indexes of the web: Google, Microsoft, Yandex and Baidu. We need a European index!
Great plans at the opening of #OSSYM2020 streaming from CERN, "the Cradle of the Web".
European Galileo Programme for Internet Autonomy.
We need digital sovereignity!
Many search engines exist, but who do crawl their own index?
An infographic capturing the English Web sources, by Mojeek.
"In fact, I go as far as to say the machine learning is really a misnomer. When we say the machines learn, it’s kind of like saying that baby penguins fish. Would baby penguins really do is they sit there, and the mom or the dad penguin, they go, they find the fish, they bring it, they chew it up, and they regurgitate it. They spoon-feed morsels to their babies in the nest. That’s not the babies fishing, that’s the parent’s fishing."
Job offer: PhD position in Knowledge Graphs - CNRS/LIRIS - INSA de Lyon - Lyon/Villeurbanne - France https://filesender.renater.fr/?s=download
Open Marie Curie PhD position on Bias mitigation in Artificial Intelligence https://nobias-project.eu
Whoa, wait a minute, OpenAI exclusively licensed GPT-3 to... Microsoft?? https://www.technologyreview.com/2020/09/23/1008729/openai-is-giving-microsoft-exclusive-access-to-its-gpt-3-language-model/
The aggressive tactics employed by proctoring companies in response to student complaints and critical analysis of the software.
"The richest 5% (whose average income is $100,000 per year) capture no less than 46% of global income. In other words, half of all our economic activity – all the mines, all the factories, all the power stations, all the shipping, and all of the ecological impact that’s associated with these things – is done to make rich people richer."
Quick reminder: Researchers at Google used CommonCrawl to train T5, because using open data is so much better for research, and apparently, CommonCrawl is not much worse than Google's own crawl.
@ingo did your PO for the ITN come with suggestions how to accommodate secondmends during Covid?
ApacheCon EU 2016 presentation by Sebastian Nagel on crawling for CommonCrawl using Nutch (and Stormcrawler for news sites):
* Running on AWS
* They're getting 7000 pages/sec!
* Seeds are really important...(donated by Blekko until 2015)
> Van Halen was a surprise guest on "Beat It," the album's third single. His blazing guitar solo lasted all of 20 seconds and took half an hour to record. He did it for free, as a favor to producer Quincy Jones, while the rest of his Van Halen bandmates were out of town.
Why we invented DBMSes:
Relational DBMSs since 1970s... too high tech solutions for administrators...
PeerTube needs about 13k more to reach that €60,000 / Live streaming feature!
A free as in freedom decentralized video sharing and soon possibly even streaming platform.
The fine developers of Framasoft have a roadmap for version 3.0 of PeerTube.
It includes all the things one needs to ditch Youtube!
Read about it via link & please support if able. We are almost there.
Not only did NPO see similar or better click-through rates on ads that were served without knowing ANYTHING about the user, but more users saw those ads because the ads didn't have to get through a tracker blocker.
And, to Hwang's point, these high-performing, highly visible ads each delivered double the money to NPO, because there was no scammy ad-tech industry in the middle raking off a 50% vig for behavioral analysis, real-time auctions and other socially useless smoke-and-mirrors.
3rd CFP: ACM TOIS Special Section on Graph Technologies for User Modeling and Recommendation https://lists.utwente.nl/cgi-bin/wa.exe?A2=SIGIR;81636cae.2010
Documentary available for a few days now, on vimeo, at
Indie music 🎸
The "unofficial" Information Retrieval Mastodon Instance.
Goal: Make idf.social a viable and valuable social space for anyone working in Information Retrieval and related scientific research.
Everyone welcome but expect some level of geekiness on the instance and federated timelines.