몇 가지 유용한 Pymongo 조각

22892 단어 pymongopythonmongodb
## 목차
1. Display all databases available
2. Display all collections available
3. Display one document from collection
4. Display number of documents in the collection
5. Display top 10 document's specific field
6. Find 10 first authors name in ascending alphabetical order
7. Display the quantity of documents that has not regex pattern
8. Display/find number of documents uploaded between dates
9. Find documents by text search
10. Find documents includes pattern/regex in a field
11. Group by field and count documents, then sort by best
12. Update document if exists otherwise insert new document

Mongodb Atlas 데이터베이스에 저장된 논문 컬렉션이 있습니다. 예시 문서:

{
    '_id': ObjectId('5fa9a4db76fdd8d66273c643'),
    'id': '0704.0001',
    'submitter': 'Pavel Nadolsky',
    'authors': "C. Bal\\'azs, E. L. Berger, P. M. Nadolsky, C.-P. Yuan",
    'title': 'Calculation of prompt diphoton production cross sections at Tevatron and\n  LHC energies',
    'comments': '37 pages, 15 figures; published version',
    'journal-ref': 'Phys.Rev.D76:013009,2007',
    'doi': '10.1103/PhysRevD.76.013009',
    'report-no': 'ANL-HEP-PR-07-12',
    'categories': 'hep-ph',
    'abstract': '  A fully differential calculation in perturbative quantum chromodynamics is\npresented for the production of massive photon pairs at hadron colliders. All\nnext-to-leading order perturbative contributions from quark-antiquark,\ngluon-(anti)quark, and gluon-gluon subprocesses are included, as well as\nall-orders resummation of initial-state gluon radiation valid at\nnext-to-next-to-leading logarithmic accuracy. The region of phase space is\nspecified in which the calculation is most reliable. Good agreement is\ndemonstrated with data from the Fermilab Tevatron, and predictions are made for\nmore detailed tests with CDF and DO data. Predictions are shown for\ndistributions of diphoton pairs produced at the energy of the Large Hadron\nCollider (LHC). Distributions of the diphoton pairs from the decay of a Higgs\nboson are contrasted with those produced from QCD processes at the LHC, showing\nthat enhanced sensitivity to the signal can be obtained with judicious\nselection of events.\n',
    'update_date': '2008-11-26',
    'authors_parsed': [
        ['Balázs', 'C.', ''],
        ['Berger', 'E. L.', ''],
        ['Nadolsky', 'P. M.', ''],
        ['Yuan', 'C. -P.', '']
    ]
}



내 데이터베이스 관련 프로젝트에 대한 일반적인 사용 사례가 있습니다. 그래서 필요할 때 도움이 될 스니펫 목록을 준비했습니다.

파이몽고 초기화:

from pymongo import MongoClient

#Connection to the Database
full_dns_name = 'mongodb://***'
username = 'test'
password = 'test'
authSource = 'admin'

client = MongoClient(host=full_dns_name, username=username, password=password, authSource=authSource)


1. 사용 가능한 모든 데이터베이스 표시:
#display all databases available
db_list = list(client.list_databases())
print(db_list)

#or

for db in client.list_databases():
    print(db)
2. Display all collections available:
#display all collections available
for db in client.list_databases():
    name = db['name']
    for col in client[name].list_collections():
        print(col)
3. Display one document from collection:
#display one document from "Papers" collection
db = client.arxiv
papers_col = db.papers
doc = papers_col.find_one()
print(doc)
4. Display number of documents in the collection:
#display number of documents in the collection
number_of_doc = papers_col.count_documents({})
print(number_of_doc)
5. Display top 10 document's specific field:
#display 10 articles titles
articles = list(papers_col.find({}, {'title': 1}).limit(10))
6. Find 10 first authors name in ascending alphabetical order:
from pymongo import ASCENDING
#"Submitter" attribute is author's name
#Display 10 first authors name in ascending alphabetical order

# sort, get 10
articles = list(
    papers_col.find({}, {'submitter': 1})
    .limit(10)
    .sort([('submitter', ASCENDING)])
)
print(articles)
7. Display the quantity of documents that has not regex pattern:
#Display the quantity of articles that has not published by "Damien Chablat"
pattern = re.compile(r'Damien Chablat')
articles = papers_col.count_documents({ 'submitter': { '$not': pattern } } )
print(articles)
8. Display/find number of documents uploaded between dates:
#"update_date" attibute contain documents upload date informations (yyyy-mm-dd format)
# Display number of article upload on 2014

from datetime import date

first_date = date.isoformat(date(2014,1,1))
last_date = date.isoformat(date(2015,1,1))

articles_count = papers_col.count_documents({'update_date':{'$gte':first_date,'$lt':last_date}})
print(articles_count)
9. Find documents by text search:
# Display an article title where "Machine Learning" is metionned in the abstract

papers_col.create_index([("abstract", TEXT)])
articles = papers_col.find({"$text":{"$search": "Machine Learning"}},{'abstract':1})
10. Find documents includes pattern/regex in a field:
# Display an article title where "Machine Learning" is metionned in the abstract

pattern = re.compile(r'Machine Learning')
articles = papers_col.find({ 'abstract': { '$regex': pattern } } )

print(list(articles))
11. Group by field and count documents, then sort by best:
#Display the amount of publications/articles for the 10 best submitters

pipeline = [
    { "$group": {"_id": "$submitter", "count": {"$sum": 1}} },
    { "$sort": { "count": -1 } },
    { '$limit': 10 }
]

articles = list(papers_col.aggregate(pipeline))
print(articles)
12. Update document if exists otherwise insert new document:
def  update_or_create_paper(paper_data):
    # update 'data' if custom 'id' exists otherwise insert new document
    return collection.find_one_and_update({"id": paper_data['id']},
                               {"$set": {"data": {**paper_data}}},
                               upsert=True)

좋은 웹페이지 즐겨찾기