Files
2012-02-21 01:15:00 -05:00

1 line
11 KiB
JSON

[{"user_id": 3580, "stars": [], "topic_id": 43132, "date_created": 1312258957.6023991, "message": "Quick ways I've reproduced a symptom that we spotted on our celery workers are by .all() querying a good sized table.. In one case, I iterated the items printing some of them to stdout in shell.. didn't hold a reference to any of them...", "group_id": 81, "id": 1772512}, {"user_id": 3580, "stars": [], "topic_id": 43132, "date_created": 1312258867.4858799, "message": "So.. This is probably more of a python than a Django question, but I figured I'd ask it here in case there's anything particularly worth watching out for in Django... I'm trying to sort out why my python processes aren't returning significant amounts of memory to the OS, and wondering if anyone has any tips that I might not have stumbled on yet...", "group_id": 81, "id": 1772498}, {"user_id": 3580, "stars": [], "topic_id": 43132, "date_created": 1312258883.928683, "message": "Details... python 2.6.5, Ubuntu, Django 1.2", "group_id": 81, "id": 1772502}, {"user_id": 3580, "stars": [], "topic_id": 43132, "date_created": 1312258967.9424059, "message": "footprint grew to 630mb, and stayed there", "group_id": 81, "id": 1772513}, {"user_id": 3580, "stars": [], "topic_id": 43132, "date_created": 1312259027.4732339, "message": "I read about this being standard behavior with versions <= 2.4 but it seems like plenty of attention has gone into improving that since. I also confirmed simple stuff like making sure I'm not running Django in debug, etc. Anything else that would be good to look into?", "group_id": 81, "id": 1772522}, {"user_id": 23352, "stars": [], "topic_id": 43132, "date_created": 1312263774.371701, "message": "Maybe first compare your current memory consumption against what happens 1) if you process the table in batches [slices of all() instead of just all()], or 2) with all().iterator(). This might give insight to where the \"leak\" is.", "group_id": 81, "id": 1772873}, {"user_id": 39020, "stars": [], "topic_id": 43132, "date_created": 1312284485.6810091, "message": "thats a good start. we've noticed similar memory footprint issues processing large tables and pushing their rows as jobs to our worker pool. i would definitely try slicing that table and seeing if that lowers your usage.", "group_id": 81, "id": 1773976}, {"user_id": 21294, "stars": [{"date_created": 1312302003.7274489, "user_id": 3580}, {"date_created": 1312380085.7527189, "user_id": 35636}], "topic_id": 43132, "date_created": 1312289455.70769, "message": "http://stackoverflow.com/questions/5494178/why-doesnt-memory-get-released-to-system-after-large-queries-or-series-of-queri", "group_id": 81, "id": 1774384}, {"user_id": 21294, "stars": [], "topic_id": 43132, "date_created": 1312289472.6770401, "message": "there's a question I asked a while back that explains it", "group_id": 81, "id": 1774388}, {"user_id": 21294, "stars": [], "topic_id": 43132, "date_created": 1312289606.522682, "message": "In the comments of the answer I checked, you'll see a link to a video by the dropbox team explaining some more details", "group_id": 81, "id": 1774405}, {"user_id": 21294, "stars": [], "topic_id": 43132, "date_created": 1312289548.471051, "message": "I spent the better part of a week debugging the issue for hours using heapy/guppy and a bunch of other memory profiling tools... just to find out... memory fragmentation cannot be fixed without \"compacting\" memory management", "group_id": 81, "id": 1774399}, {"user_id": 3580, "stars": [], "topic_id": 43132, "date_created": 1312302657.3938949, "message": "interestingly.. I seem to see much nicer behavior on my Mac, versus our Ubuntu envs.. the ubuntus are running 2.6.5 instead of 2.6.6.. and I suppose that could have some bearing on it, but are there any other reasons that I ought to notice a significant difference?", "group_id": 81, "id": 1776669}, {"user_id": 3580, "stars": [], "topic_id": 43132, "date_created": 1312302164.57301, "message": "Thanks.. okay, that makes perfect sense as to why the 'solutions' that came out in 2.5 are still often not.. solutions. Slicing seems like an interesting way to keep things under control, though it probably still doesn't really solve the fragmentation issue, right? It would just say... allow my worker to keep playing in a smaller pool of memory that it would fragment up, and eventually slowly grow out of, instead of spiking out hundreds of mb and 'ruining' all that memory at once.", "group_id": 81, "id": 1776601}, {"user_id": 21294, "stars": [], "topic_id": 43132, "date_created": 1312303152.2649939, "message": "nice, I had no idea there were postrun signals for tasks... it makes so much sense...", "group_id": 81, "id": 1776734}, {"user_id": 3580, "stars": [], "topic_id": 43132, "date_created": 1312303007.0335, "message": "Miguel on the celery list gave me this tip.. I haven't tested it yet, but it's pretty clever.. Throw an exception in a post-signal to 'encourage' celery to reboot the worker.", "group_id": 81, "id": 1776716}, {"user_id": 3580, "stars": [], "topic_id": 43132, "date_created": 1312302972.6292911, "message": "http://groups.google.com/group/celery-users/msg/887324e121ebb4c0", "group_id": 81, "id": 1776713}, {"user_id": 21294, "stars": [], "topic_id": 43132, "date_created": 1312302865.564851, "message": "I was hoping to do something like that when I was debugging the issue but didn't figure out how to do that", "group_id": 81, "id": 1776697}, {"user_id": 21294, "stars": [], "topic_id": 43132, "date_created": 1312302387.344641, "message": "smaller slice sizes would help, but it doesn't \"solve\" the issue", "group_id": 81, "id": 1776634}, {"user_id": 21294, "stars": [], "topic_id": 43132, "date_created": 1312302439.1949489, "message": "python knows what memory is free to use and will reuse it, the problem is that even if there is one byte allocated at the end of the segment of memory that it has allocated, it cannot release it to the OS because of that byte", "group_id": 81, "id": 1776641}, {"user_id": 21294, "stars": [], "topic_id": 43132, "date_created": 1312302489.156914, "message": "too small a slice _could_ fragment memory more, but I wouldn't worry too much about it. just worry if memory continues to grow", "group_id": 81, "id": 1776645}, {"user_id": 3580, "stars": [], "topic_id": 43132, "date_created": 1312302595.326195, "message": "In this case I think for these tasks my approach is going to be to ask Celery to restart the particular worker, since I know certain tasks like rebuilding the indexes are likely to fragment large amounts of memory.", "group_id": 81, "id": 1776663}, {"user_id": 21294, "stars": [], "topic_id": 43132, "date_created": 1312302847.8910999, "message": "just curious, how do you ask celery to restart the particular worker?", "group_id": 81, "id": 1776696}, {"user_id": 21294, "stars": [], "topic_id": 43132, "date_created": 1312302921.7076919, "message": "I tried some 2.6 vs 2.7 testing and it didn't really yield much results. My guess is that they arent much different. Python doesn't implement memory compacting management AFAIK", "group_id": 81, "id": 1776699}, {"user_id": 3580, "stars": [], "topic_id": 43132, "date_created": 1312303630.7276411, "message": "yeah, totally.", "group_id": 81, "id": 1776788}, {"user_id": 3580, "stars": [], "topic_id": 43132, "date_created": 1312303230.2785411, "message": "Totally. Since these memory issues are a pretty obvious issue for things like celery it'd be neat to set up postrun signals that could do intelligent stuff like enforce a max-memory threshold for workers, etc.. instead of just relying on MAX_TASKS", "group_id": 81, "id": 1776741}, {"user_id": 21294, "stars": [], "topic_id": 43132, "date_created": 1312303597.1752429, "message": "I agree, very cool idea", "group_id": 81, "id": 1776783}, {"user_id": 21294, "stars": [], "topic_id": 43132, "date_created": 1312303617.8692801, "message": "maybe even add a fancy decorator like @task_postrun(max_memory=...)", "group_id": 81, "id": 1776785}, {"user_id": 3580, "stars": [], "topic_id": 43132, "date_created": 1312336346.352313, "message": "@dlamotte So yeah.. did this today and it works nicely. Didn't go as far as the max-memory stuff cause I don't quite need that yet, and I figured it's very possible that gc hasn't run right as the task is exiting, so without a timer or something it might be too aggressive. This decorator is what I'm using now, and it seems to do the trick:", "group_id": 81, "id": 1781364}, {"user_id": 3580, "stars": [{"date_created": 1312350045.3994451, "user_id": 24931}], "topic_id": 43132, "date_created": 1312336347.5639839, "message": "from celery.signals import task_postrun\n\ndef shutdown_worker(**kwargs):\n raise SystemExit()\n \ndef bounce_worker_after(func):\n task_postrun.connect(shutdown_worker, sender=func)\n return func\n", "group_id": 81, "id": 1781365}, {"user_id": 19532, "stars": [], "topic_id": 43132, "date_created": 1312352785.207967, "message": "antirez is redis author, and he just switched to jemalloc on linux because of libc malloc memory fragmentation. He says \"If you are on osx or *BSD you can still force a jemalloc build with make USE_JEMALLOC=yes, but those other systems have a sane libc malloc so usually this is not required. Also a few of those systems use jemalloc-derived libc malloc implementations.\"", "group_id": 81, "id": 1782564}, {"user_id": 19532, "stars": [], "topic_id": 43132, "date_created": 1312352656.8274851, "message": "for macos vs ubuntu you can find more information here: http://antirez.com/post/everything-about-redis-24.html", "group_id": 81, "id": 1782556}, {"user_id": 21294, "stars": [], "topic_id": 43132, "date_created": 1312373653.020364, "message": "@phill nice, I'll definitely be trying that out today", "group_id": 81, "id": 1783714}, {"user_id": 21294, "stars": [], "topic_id": 43132, "date_created": 1312377311.5814719, "message": "thanks @vad, very interesting read", "group_id": 81, "id": 1783917}, {"user_id": 21294, "stars": [], "topic_id": 43132, "date_created": 1312826894.4985471, "message": "I get \"WorkerLostError()\" exceptions", "group_id": 81, "id": 1823391}, {"user_id": 21294, "stars": [], "topic_id": 43132, "date_created": 1312826874.311939, "message": "@phill I've noticed that I get emails (I use CELERY_SEND_TASK_ERROR_EMAILS = True) even though things happen after the task is complete and successful, have you noticed this?", "group_id": 81, "id": 1823388}, {"user_id": 3580, "stars": [], "topic_id": 43132, "date_created": 1314832015.312345, "message": "@dlamotte Sorry.. just seeing this. Maybe a convore hiccup. Anyways, I don't have that setting turned on.. I've got a decorator sending emails when my tasks fail, nested inside the decorator that shuts them down.. so that's probly why.. but that's a good point. We'd probly have to hack something nicer to sort that issue out. I do see some kind of 'Worker Ended Prematurely' or something, but doesn't seem to be a problem. Celery spins a new one up and we continue.", "group_id": 81, "id": 2015359}]