mirror of
https://github.com/holodata/sensai-dataset.git
synced 2025-03-15 12:00:32 +09:00
chore: fix docs
This commit is contained in:
parent
e241a46bfc
commit
fe6fed0759
19
LICENSE
Normal file
19
LICENSE
Normal file
@ -0,0 +1,19 @@
|
||||
Copyright (c) 2021 Yasuaki Uechi <y@uechi.io>
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
of this software and associated documentation files (the "Software"), to deal
|
||||
in the Software without restriction, including without limitation the rights
|
||||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
copies of the Software, and to permit persons to whom the Software is
|
||||
furnished to do so, subject to the following conditions:
|
||||
|
||||
The above copyright notice and this permission notice shall be included in all
|
||||
copies or substantial portions of the Software.
|
||||
|
||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
||||
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
||||
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
|
||||
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
|
||||
DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
|
||||
OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE
|
||||
OR OTHER DEALINGS IN THE SOFTWARE.
|
37
README.md
37
README.md
@ -34,12 +34,9 @@ Ban and deletion are equivalent to `markChatItemsByAuthorAsDeletedAction` and `m
|
||||
|
||||
| column | type | description |
|
||||
| --------------- | ------ | ---------------------------- |
|
||||
| timestamp | string | UTC timestamp |
|
||||
| body | string | chat message |
|
||||
| membership | string | membership status |
|
||||
| id | string | anonymized chat id |
|
||||
| authorChannelId | string | anonymized author channel id |
|
||||
| videoId | string | source video id |
|
||||
| channelId | string | source channel id |
|
||||
|
||||
#### Membership status
|
||||
@ -60,33 +57,23 @@ Ban and deletion are equivalent to `markChatItemsByAuthorAsDeletedAction` and `m
|
||||
Set `keep_default_na` to `False` and `na_values` to `''` in `read_csv`. Otherwise, chat message like `NA` would incorrectly be treated as NaN value.
|
||||
|
||||
```python
|
||||
chats = pd.read_csv('../input/vtuber-livechat/chats_2021-03.csv',
|
||||
na_values='',
|
||||
keep_default_na=False,
|
||||
index_col='timestamp',
|
||||
parse_dates=True)
|
||||
import pandas as pd
|
||||
from glob import iglob
|
||||
|
||||
flagged = pd.concat([
|
||||
pd.read_csv(f,
|
||||
na_values='',
|
||||
keep_default_na=False)
|
||||
for f in iglob('../input/sensai/chats_flagged_*.csv')
|
||||
],
|
||||
ignore_index=True)
|
||||
```
|
||||
|
||||
### Channels (`channels.csv`)
|
||||
|
||||
| column | type | description |
|
||||
| ----------------- | --------------- | ---------------------- |
|
||||
| channelId | string | channel id |
|
||||
| name | string | channel name |
|
||||
| englishName | nullable string | channel name (English) |
|
||||
| affiliation | string | channel affiliation |
|
||||
| group | nullable string | group |
|
||||
| subscriptionCount | number | subscription count |
|
||||
| videoCount | number | uploads count |
|
||||
| photo | string | channel icon |
|
||||
|
||||
Inactive channels have `INACTIVE` in `group` column.
|
||||
|
||||
## Consideration
|
||||
|
||||
### Anonymization
|
||||
|
||||
`id` and `channelId` are anonymized by SHA-1 hashing algorithm with a pinch of undisclosed salt.
|
||||
`authorChannelId` are anonymized by SHA-1 hashing algorithm with a pinch of undisclosed salt.
|
||||
|
||||
### Handling Custom Emojis
|
||||
|
||||
@ -97,7 +84,7 @@ All custom emojis are replaced with a Unicode replacement character `U+FFFD`.
|
||||
```latex
|
||||
@misc{sensai-dataset,
|
||||
author={Yasuaki Uechi},
|
||||
title={Sensai: Large Scale Virtual YouTubers Live Chat Dataset},
|
||||
title={Sensai: Toxic Chat Dataset},
|
||||
year={2021},
|
||||
month={8},
|
||||
version={31},
|
||||
|
Loading…
x
Reference in New Issue
Block a user